EDME conducts Effect Direction MEta-analysis of Genome-Wide Association Analysis (GWAS) for the same traits generated from two independent populations. The principle is to use the agreement of GWAS signs (+ or – for the same trait in two populations to identify) to identify true singals for GWAS. It can be applied to single-trait and multi-trait analysis.

This page is a tutorial of EDME (edme.v1.0.zip) which can be downloaded from https://melbourne.figshare.com/articles/Effect_Direction_MEta-analysis_EDME_of_GWAS/11730939. EDME is currently implemented in R with several major R scripts included in the edme.v1.0.zip. In the same zip file, there are also sample data and a workflow R script ‘demo.R’ which show how to use the R scripts along with the sample GWAS results (5 traits in two populations) to conduct EDME.

Content in the edme.v1.0.zip:

1) README.txt: brief description of EDME, citation and license (CC BY-NC 4.0)

2) demo.R: a workflow R script using the sample data (in folder pop1 and pop2) and other R scripts to conduct EDME

3) read2pgwaOS1.1.R and read2pgwa.R: two R scripts enable functions to read in GWAS results, read2pgwaOS1.1.R is a fast read-in because it assumes the user has organised the GWAS results so that i) the allele with which the GWAS effects were calculated is the same across traits and populations and ii) the SNP-row of the GWAS results are in the same order across different traits and populations. Conversely read2pgwa.R does not have these assumptions and therefore will organise the GWAS data. However, if you have large datasets, this process may be slow

4) edmesc.v1.3.R: after read GWAS data, this script will enable a function to calculate the false discovery by effect direction (FDRed) for single-trait and multi-trait analysis; it will also produce additional data for the multi-trait analysis to identify true pleiotropy using other R scripts

5) edmedpd.R: this script will enable a function to calculate a pi value which indicates the overall agreement of multi-trait effects of a SNP between two populations

6) spmt2.R: This script is an R implementation of the traditional multi-trait meta-analysis described by Bolormaa et al 2014 PloS Genetics (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004198)

7) wt.R: This script enables a function to calculate a weighted t value based on GWAS results from two populations described in Xiang et al 2018 BMC Genomics (https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4902-8)

Folder pop1 and pop2: sample GWAS results from population 1 and population 2, respectively. For each population, GWAS results for 5 traits in the form of ‘mlma’ which is the output from GCTA (https://cnsgenomics.com/software/gcta/#MLMA) are included. More than 10K markers are included in each GWAS result

Workflow of EDME (as shown in the demo.R)

- Read-in GWAS results.

#go the the working direction that you have unzipped the edme.v1.0.zip setwd(!!!YOURPATH!!!/edme.demo) #load three libraries that need for the analysis library(data.table) library(dplyr) library(qvalue) #---define two paths to GWAS results from two populations path1 <- paste0('pop1/') path2 <- paste0('pop2/') #---read the 'read-in' script source('read2pgwaOS1.1.R') #---read in GWAS results dlist <- read2pgwaOS(path1,path2,id='SNP',fpa='mlma') #id: the column name in the GWAS results give the ID of SNPs; in this case the id is 'SNP' #fpa: file suffix: the suffix of the GWAS results', in this case the suffix is 'mlma', which is the standard GWAS output of GCTA

**Several notes for reading the data you need to be aware**: 1) EDME primarily uses the GWAS output (*.mlma) from GCTA which is a very common and powerful tool for GWAS; 2) The files for GWAS results (*.mlma) of the same trait need to have the same name in different files. For example, for GWAS of the 1st trait in the population 1, the file name is ‘cb.tr01.mlma’ and this is the same as the GWAS of the 1st trait in the population 2; 3) the function read2pgwaOS() works on the assumption that there is no sign flip due to technical issues, i.e., the A1 of GWAS of the population 1 is the same from the A1 of GWAS of the population. Also, read2pgwaOS() assumes that the order of rows (SNP) are the same in different GWAS files. If these two assumptions are not met, please use source(‘read2pgwa.R’) which will do the correction and sorting. However, this will make the reading process slow if you have large datasets.

1.1 If you choose to use the script read2pgwa.R the argurments are the same as read2pgwaOS()

source('read2pgwa.R') dlist <- read2pgwa(path1,path2,id='SNP',fpa='mlma')

2. Calculate single-trait and multi-trait FDRed with EDME, cau=F, do not impose the p-value threshold in two populations

source('edmesc.v1.3.R') scdt.all <- edmesc(dlist,k=5,id='SNP',chr='Chr',bp='bp',bpa='b',sepa='se',ppa='p',pvec=c(10^(0:-20)),cau=F) #cau=F, do not impose the p-value threshold in two populations, therefore each population will have a different set of FDRed and conventional FDR #==necessary arguments: #dlist: the readin files of GWAS results by read2pgwaOS() or read2pgwa() #k: the number of traits included for analysis #id: the colname of the marker ID #chr: the colname of the marker chromosome #bp: the colname of the marker bp position #bpa: colname of the marker effect (beta) #sepa: colname of the marker effect standard error (se) #ppa: colname of the marker effect p-value (se) #pvec: a vector of p value thresholds that you want to calculate FDR (e.g., c(0.1,0.01,0.001,...)) #cau: if the p value threshold is only imposed in 1 population not both, =F: Not imposed in two populations; =T: imposed in both populations.

2.1. Calculate single-trait and multi-trait FDRed with EDME; cau=T, do impose the p-value threshold in two populations, the FDRed for two populations will be the same. This is what we used in the paper

scdt.cau <-edmesc(dlist,k=5,id='SNP',chr='Chr',bp='bp',bpa='b',sepa='se',ppa='p',pvec=c(10^(0:-20)),cau=T)

2.2 check results (1st result)

#edmesc() will produce three results in a list names(scdt.cau) #[1] "sfdred" "mcount" "mfdred" #---the 1st result is the single-trait FDRed and conventional FDR (Bolormaa 2014); head(scdt.all[[1]]) tr cut same.r t0.r ti.r fdr.ed.r fdr.r same.v t0.v ti.v fdr.ed.v 1: tr1 1e+00 6972 8020 10982 0.7302859 NA 6972 8036 10990 0.7312102 2: tr1 1e-01 1977 1240 2597 0.4774740 0.35951739 2011 888 2455 0.3617108 3: tr1 1e-02 439 142 510 0.2784314 0.20776391 399 92 445 0.2067416 4: tr1 1e-03 44 8 48 0.1666667 0.22839506 19 4 21 0.1904762 5: tr1 1e-04 11 0 11 0.0000000 0.09990999 0 0 0 NA 6: tr1 1e-05 0 0 0 NA NA 0 0 0 NA fdr.v 1: NA 2: 0.3867391 3: 0.2395869 4: 0.5233329 5: NA 6: NA #the column from left right: #tr: order of trait #cut: p-value threshold for population 1 #same.r: the number of varaints with the same direction between populations at the p-value threshold for population 1 #t0.r: the number of variants true 0 effect, which is equivalent to the number of variants with different directions between populations at the p-value threshold for population 1 times 2 ((ti.r-same.r)*2) #ti.r< the total number of significant variants at the p-value threshold for population 1 #fdr.ed.r: FDRed (t0.r/ti.r) at the p-value threshold for population 2 #fdr.r: conventional FDR at the p-value threshold for population 2 #t0.v: the number of variants true 0 effect, which is equivalent to the number of variants with different directions between populations at the p-value threshold for population 2 times 2 ((ti.r-same.r)*2) #ti.v< the total number of significant variants at the p-value threshold for population 2 #fdr.ed.v: FDRed (t0.r/ti.r) at the p-value threshold for population 2 #fdr.v: conventional FDR at the p-value threshold for population 2 #==if you used cau=T, then the p-value threshold is imposed in both population and the 1st result will only have one set of FDRs. this is what we have used in the paper.

2.3 check results (2nd result)

#---the 2nd result is the count of the agreement of trait effect at the variant level. For each variant across all traits analysed, '1' means this variant has agreed variant effect directions between two populations and '0' means this variant has disagreed variant effect directions between two populations head(scdt.all[[2]]) SNP csd.tr1 csd.tr2 csd.tr3 csd.tr4 csd.tr5 1: Chr22:136032 1 0 1 0 1 2: Chr22:138506 1 0 1 0 1 3: Chr22:138873 1 0 0 1 1 4: Chr22:139106 0 0 0 1 0 5: Chr22:139915 1 0 1 1 1 6: Chr22:140212 1 0 1 0 1

2.4 check results (3rd result)

#---the 3rd result is the summary of the 2nd result for each variant across all traits. head(scdt.all[[3]]) SNP csd.all mt0 mtr trn 1: Chr22:136032 3 0.8 0.2 1 2: Chr22:138506 3 0.8 0.2 1 3: Chr22:138873 3 0.8 0.2 1 4: Chr22:139106 1 1.0 0.0 0 5: Chr22:139915 4 0.4 0.6 3 6: Chr22:140212 3 0.8 0.2 1 #csd.all: number of traits that this variant had agreed effects between populations #mt0: estimated proportion of true zero effects (in this case, for 5 traits =(1-3/5)*2) #mtr: estimated proportion of true effects (1-mt0) #trn: estimated number of true effects (round(mtr*5,0); this is the 'predicted' number of pleiotropy which is imprecise for each variant, but is a good estimated as an average across many variants. However, this is not the multi-trait FDRed which is calculated using the next (dot product) function

3. Calculate pi value (dot product) between 2 populations. The use of pi is to focus the FDRed on the effects estimated robustly. A positive pi value indicates a variant has overall agreed effects between two populations across traits analysed, and a negative pi indicates that this variant has disagreed effects across two populations. The proportion of variants with negative is the multi-trait FDRed.

source('edmedpd.R') #read in this file will allow you to use the function (edmedpd), which calculates the dot product(pi value) of effects in two populations for each variant. mtdpd <- edmedpd(dlist,id='SNP',chr='Chr',bp='bp',bpa='b',sepa='se') head(mtdpd) SNP Chr bp pi 1: Chr25:62517 25 62517 0.5228802 2: Chr25:62900 25 62900 -0.5352556 3: Chr25:62941 25 62941 0.2727864 4: Chr25:64181 25 64181 0.6935730 5: Chr25:64525 25 64525 0.6935730 6: Chr25:64852 25 64852 0.6935730

3.1 if you have large datasets, you should save these multi-trait meta-analysis results before the next analysis

write.table(scdt.cau$sfdred,gzfile('res/sfdred.cau.34ct.txt.gz'),row.names=F,quote=F) write.table(scdt.cau$mfdred,gzfile('res/mfdred.cau.34ct.txt.gz'),row.names=F,quote=F) write.table(scdt.cau$mcount,gzfile('res/mcount.cau.34ct.txt.gz'),row.names=F,quote=F) write.table(mtdpd,gzfile('res/mtpi.txt.gz'),row.names=F,quote=F)

4. Calculate the traditional multi-trait p-value. This will be used together with the pi value to estimate the multi-trait FDRed. The script spmt2.R is an R script to do the traditional multi-trait meta-analysis by Bolormaa et al 2014:

source('spmt2.R') pop1.mt <- etme(dlist[[1]],id='SNP',chr='Chr',bp='bp',bpa='b',sepa='se') colnames(pop1.mt)[4:ncol(pop1.mt)] <- paste0('pop1.',colnames(pop1.mt)[4:ncol(pop1.mt)]) pop2.mt <- etme(cbind(dlist[[2]][,1],dlist[[1]][,2:3],dlist[[2]][,-1]),id='SNP',chr='Chr',bp='bp',bpa='b',sepa='se') colnames(pop2.mt)[4:ncol(pop2.mt)] <- paste0('pop2.',colnames(pop2.mt)[4:ncol(pop2.mt)]) head(pop1.mt) SNP Chr bp pop1.df pop1.x2 pop1.X2.p pop1.fdr 1: Chr25:62517 25 62517 5 4.802899 0.4404053 0.9942780 2: Chr25:62900 25 62900 5 2.170403 0.8250993 0.9942780 3: Chr25:62941 25 62941 5 3.291437 0.6551530 0.9942780 4: Chr25:64181 25 64181 5 1.069893 0.9567338 0.9964465 5: Chr25:64525 25 64525 5 1.069893 0.9567338 0.9964465 6: Chr25:64852 25 64852 5 1.069893 0.9567338 0.9964465

4.1 if you have large datasets, you should save these multi-trait meta-analysis results

write.table(pop1.mt,gzfile('res/pop1.meta.res.txt.gz'),row.names=F,quote=F) write.table(pop2.mt,gzfile('res/pop2.meta.res.txt.gz'),row.names=F,quote=F)

5. Use the multi-trait p-value in combination with the pi value to calculate the multi-trait FDRed

#merge all multi-trait result datasets: mtdt <- Reduce(function(...) merge(...,by='SNP',all=T,sort=F),list(mtdpd,scdt.all$mfdred,pop1.mt[,c(1,6)],pop2.mt[,c(1,6)]))

5.1 Calculate multi-trait fdr when the multi-trait p-value is imposed in two populations, also calculate the predicted number of True effects which will be used in the next step to calculated the observed number of True effects

plist <- list() for (p in c(1,0.5,0.1,5e-2,1e-3,5e-4,1e-6,5e-8,5e-10)){ plist[[as.character(p)]] <- mtdt[pop1.X2.p<p&pop2.X2.p<p][,list(p,.N,mean(csd.all,na.rm=T),sd(csd.all,na.rm=T)/sqrt(.N),mean(trn,na.rm=T),sd(trn,na.rm=T)/sqrt(.N),sum(pi<=0,na.rm=T)*2/.N,p*(1- .N/mtdt[,.N]) / ((.N/mtdt[,.N])*(1-p)))] } pdt <- setDT(data.frame(do.call(rbind,plist))) colnames(pdt) <- c('pthresh','N','N.tr.sameDir','N.tr.sameDir.se','predict.TE','predict.TE.se','FDRed','convent.FDR') pdt[,FDRed:=ifelse(FDRed>=1,1,FDRed)] pdt[,convent.FDR:=ifelse(!is.infinite(convent.FDR)&convent.FDR>=1,1,convent.FDR)]

5.2 Check results

print(pdt,nrow(pdt)) pthresh N N.tr.sameDir N.tr.sameDir.se predict.TE predict.TE.se 1: 1e+00 10979 3.063576 0.01194583 1.706622 0.01704328 2: 5e-01 3393 3.606543 0.02379369 2.648099 0.03549454 3: 1e-01 1227 4.603912 0.02564724 4.274654 0.04567615 4: 5e-02 1025 4.952195 0.01089802 4.914146 0.01928715 5: 1e-03 798 5.000000 0.00000000 5.000000 0.00000000 6: 5e-04 738 5.000000 0.00000000 5.000000 0.00000000 7: 1e-06 56 5.000000 0.00000000 5.000000 0.00000000 8: 5e-08 4 5.000000 0.00000000 5.000000 0.00000000 9: 5e-10 0 NaN NA NaN NA FDRed convent.FDR 1: 0.64923946 Inf 2: 0.41791925 1.0000000000 3: 0.06519967 0.8849950195 4: 0.01951220 0.5121951220 5: 0.00000000 0.0127972584 6: 0.00000000 0.0069560526 7: 0.00000000 0.0001954288 8: 0.00000000 0.0001374500 9: NA Inf

6. Calculate the observed number of True Effects (true pleiotropy). This calculation uses the number of the predicted TE and the number of significant variants at a given p-value. Here we use the weighted multi-trait p-value (Xiang et al 2018) from two populations for the calculation. In order to get the weighted multi-trait p-value, we need to get the weighted t value based on two 2 populations. This is obtained using ‘wt.R’ script.

source('wt.R') wtlist <- w2t(dlist,id='SNP',chr='Chr',bp='bp',bpa='b',sepa='se') names(wtlist) [1] "bcwbse" "bcwt" head(wtlist[[1]]) SNP Chr bp tr1.wb tr1.wse tr2.wb tr2.wse 1: Chr25:62517 25 62517 0.024304583 0.025697190 -0.0078275094 0.024075067 2: Chr25:62900 25 62900 -0.011315515 0.009924612 0.0023474551 0.009631585 3: Chr25:62941 25 62941 -0.014758198 0.010415290 -0.0071866223 0.010116294 4: Chr25:64181 25 64181 -0.009671149 0.009313336 -0.0009454587 0.009029289 5: Chr25:64525 25 64525 -0.009671149 0.009313336 -0.0009454587 0.009029289 6: Chr25:64852 25 64852 -0.009671149 0.009313336 -0.0009454587 0.009029289 tr3.wb tr3.wse tr4.wb tr4.wse tr5.wb tr5.wse 1: 0.039282309 0.03139532 -0.007162840 0.02921214 -0.037362768 0.03783182 2: -0.004143884 0.01190748 0.011682707 0.01221297 -0.009619211 0.01483020 3: 0.003219810 0.01250030 0.015791342 0.01282156 -0.005650585 0.01557851 4: -0.000367822 0.01124134 0.007127234 0.01135950 -0.018684679 0.01384739 5: -0.000367822 0.01124134 0.007127234 0.01135950 -0.018684679 0.01384739 6: -0.000367822 0.01124134 0.007127234 0.01135950 -0.018684679 0.01384739 head(wtlist[[2]]) SNP Chr bp tr1.wb tr1.wse tr2.wb tr2.wse 1: Chr25:62517 25 62517 0.024304583 0.025697190 -0.0078275094 0.024075067 2: Chr25:62900 25 62900 -0.011315515 0.009924612 0.0023474551 0.009631585 3: Chr25:62941 25 62941 -0.014758198 0.010415290 -0.0071866223 0.010116294 4: Chr25:64181 25 64181 -0.009671149 0.009313336 -0.0009454587 0.009029289 5: Chr25:64525 25 64525 -0.009671149 0.009313336 -0.0009454587 0.009029289 6: Chr25:64852 25 64852 -0.009671149 0.009313336 -0.0009454587 0.009029289 tr3.wb tr3.wse tr4.wb tr4.wse tr5.wb tr5.wse 1: 0.039282309 0.03139532 -0.007162840 0.02921214 -0.037362768 0.03783182 2: -0.004143884 0.01190748 0.011682707 0.01221297 -0.009619211 0.01483020 3: 0.003219810 0.01250030 0.015791342 0.01282156 -0.005650585 0.01557851 4: -0.000367822 0.01124134 0.007127234 0.01135950 -0.018684679 0.01384739 5: -0.000367822 0.01124134 0.007127234 0.01135950 -0.018684679 0.01384739 6: -0.000367822 0.01124134 0.007127234 0.01135950 -0.018684679 0.01384739 #save weighted t write.table(wtlist$bcwt,gzfile('res/chol.tw.txt.gz'),row.names=F,quote=F)

6.1 then we use the etme() function (traditional multi-trait meta-anamysis in Bolormaa et al 2014) to calculate the weighted multi-trait p-value for each variant:

wmt <- etme(wtlist$bcwbse,id='SNP',chr='Chr',bp='bp',bpa='wb',sepa='wse') head(wmt) SNP Chr bp df x2 X2.p fdr 1: Chr25:62517 25 62517 5 3.969247 0.5538521 0.8507589 2: Chr25:62900 25 62900 5 3.110396 0.6829705 0.8848636 3: Chr25:62941 25 62941 5 4.094912 0.5358335 0.8447518 4: Chr25:64181 25 64181 5 3.715342 0.5910826 0.8583647 5: Chr25:64525 25 64525 5 3.715342 0.5910826 0.8583647 6: Chr25:64852 25 64852 5 3.715342 0.5910826 0.8583647 #save weighted meta-results write.table(wmt,gzfile('res/bc.meta.res.txt.gz'),row.names=F,quote=F)

7. Now choose a weighted multi-trait p-value to count the number of variants that are significantly pleiotropic:

cut<- 1e-6 #a cutoff of p<1e-6 is used on our manuscript #number of significant pleiotropic variants at p<1e-6 level n.wsig <- wmt[X2.p<cut,.N]

8. Then the predicted combination of variant-trait is (note that data pdt is from step 5.1-5.2):

predicted.CbT <- n.wsig*round(pdt[pthresh==cut]$predict.TE,0) predicted.CbT [1] 4945 #--this means that 4945 trait-by-variant combinations would be important, based on the number of variants significant at this p-value threshold (wsig) and the predicted number of True Effects (round(pdt[pthresh==cut]$predict.TE,0))

8.1 then we use above predicted number (predicted.CbT or 4945) as a cutoff to rank the weighted t-values of variants:

#sort and rank variants based on the weighted t-values for different traits wtval <- wtlist$bcwt wtval.m <- melt(wtval[,c(1,4:ncol(wtval)),with=F],id.var='SNP') wtval.m[,rank:=frank(-abs(value))] #select the variants based on the cutoff of the predicted number of True Effects (e.g., predicted.CbT or 4945) selected.snp <- wtval.m[rank<predicted.CbT] #check selected data head(selected.snp) SNP variable value rank 1: Chr25:77299 tr1.tw -3.141970 2875.0 2: Chr25:79532 tr1.tw -3.687043 866.5 3: Chr25:80068 tr1.tw -3.687043 866.5 4: Chr25:80515 tr1.tw -3.687043 866.5 5: Chr25:85472 tr1.tw -3.687043 866.5 6: Chr25:87329 tr1.tw -3.687043 866.5

9. You can then extract two information from the data ‘selected.snp’:

9.1 the 1st piece of information extracted from ‘selected.snp’: observed number of traits associated with each variant

ob.te <- selected.snp[,.N,by=SNP] ob.te[,summary(N)] Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.000 4.000 3.133 5.000 5.000 #we observe that each variant affected around 3 traits at the weighted multi-trait p=1e-6 level

9.2 the 2nd piece of information extracted from ‘selected.snp’: which variants are associated with each trait.

#For example, the SNPs associated with tr1 is: head(selected.snp[variable=='tr1.tw']) SNP variable value rank 1: Chr25:77299 tr1.tw -3.141970 2875.0 2: Chr25:79532 tr1.tw -3.687043 866.5 3: Chr25:80068 tr1.tw -3.687043 866.5 4: Chr25:80515 tr1.tw -3.687043 866.5 5: Chr25:85472 tr1.tw -3.687043 866.5 6: Chr25:87329 tr1.tw -3.687043 866.5 ...

To this end, the EDME analysis is completed. In above steps, we have used the provided scripts and sample data to estimated: 1) single-trait FDRed, 2) multi-trait FDRed, 3) the extent of pleiotropy (average number of True Effects of each variant), 4) predicted true number of variant-by-trait combinations which can be further used to extract [4.1]: observed number of True Effects of each variant and [4.2] pleiotropic variants related to each trait.

Reference: Xiang et al (2020). Effect direction meta-analysis of GWAS identifies extreme, prevalent and shared pleiotropy in a large mammal. Communications Biology (In Press).