8.3 Fine-Mapping Output
In this section we will discuss in detail the output generated to standard output, in the case where details = NO, checkit = NO; with finite number of burn-in and follow-on iterations. This output can be redirected to a file for easier viewing.
- Input parameter file name
- Values of all the parameters specified in this file
- Total genomic distance
- Count of individuals, cases, controls and ignores used in the analysis; and also the number of real and fake markers
parameter file: par:8008
output: /home/at55/ancestrymap-rel/exampletry/outfiles/outlm:8008
### THE INPUT PARAMETERS
PARAMETER NAME: VALUE
risk: 1.5
DIR: /home/at55/ancestrymap-rel/exampletry
TAG: 8008
indivname: /home/at55/ancestrymap-rel/exampletry/indiv1.dat
snpname: /home/at55/ancestrymap-rel/exampletry/snpcnts
genotypename: /home/at55/ancestrymap-rel/exampletry/geno.dat
badsnpname: /home/at55/ancestrymap-rel/exampletry/badsnps
fakespacing: .01
tlreest: YES
OUTD: DIR/outfiles
seed: 8008
splittau: YES
fancyxtheta: YES
output: /home/at55/ancestrymap-rel/exampletry/outfiles/outlm:8008
trashdir: /home/at55/trashdir
checkit: NO
details: YES
numburn: 50
numiters: 100
emiter: 30
dotoysim: NO
cleaninit: YES
reestiter: 5
indoutfilename: NULL
snpoutfilename: /home/at55/ancestrymap-rel/exampletry/outfiles/snps:8008
localoutfilename: /home/at55/ancestrymap-rel/exampletry/outfiles/details:8008
lmmodel: YES
lmchrom: 2
lmnumx: 100
lmmax: 6.0
lmthresh: 0.0
lmdetails: YES
lmlobase: 122544286
lmhibase: 123098515
oldlmmode: NO
markername: rs6750983
pubxname: /home/at55/ancestrymap-rel/exampletry/outfiles/gams:8008
hiclip: 20
## ANCESTRYMAP version: 6210
###GENETIC DISTANCE FOR ALL CHROMOSOMES
##Chr_Num: chromosome num, First_SNP and Last_SNP: First and last markers, Gen_dist: Genetic distance
Chr_Num First_SNP Last_SNP Gen_dist
chrom: 1 first: 0 last: 429 distance: 2.834
chrom: 2 first: 430 last: 831 distance: 2.643
chrom: 3 first: 832 last: 1163 distance: 2.227
chrom: 4 first: 1164 last: 1481 distance: 2.131
chrom: 5 first: 1482 last: 1787 distance: 2.012
chrom: 6 first: 1788 last: 2071 distance: 1.914
chrom: 7 first: 2072 last: 2360 distance: 1.871
chrom: 8 first: 2361 last: 2619 distance: 1.670
chrom: 9 first: 2620 last: 2872 distance: 1.777
chrom: 10 first: 2873 last: 3141 distance: 1.809
chrom: 11 first: 3142 last: 3381 distance: 1.552
chrom: 12 first: 3382 last: 3638 distance: 1.723
chrom: 13 first: 3639 last: 3829 distance: 1.258
chrom: 14 first: 3830 last: 4006 distance: 1.159
chrom: 15 first: 4007 last: 4189 distance: 1.245
chrom: 16 first: 4190 last: 4382 distance: 1.340
chrom: 17 first: 4383 last: 4565 distance: 1.266
chrom: 18 first: 4566 last: 4738 distance: 1.160
chrom: 19 first: 4739 last: 4895 distance: 1.069
chrom: 20 first: 4896 last: 5051 distance: 1.067
chrom: 21 first: 5052 last: 5139 distance: 0.604
chrom: 22 first: 5140 last: 5244 distance: 0.710
chrom: 23 first: 5245 last: 5426 distance: 1.180
total distance: 36.224
calling setstatus
lmchrom: 2
lmchrom: 2
setlm: lmnumx: 100 lmmax: 6.000
markername: rs6750983
emiter: 30
reestiter: 5
###COUNTS
Num of fake Markers: 3622 Num of real Markers: 1805 Spacing between fake markers: 0.010
Num of Markers: 5427 Num of Samples: 1201
Num of Cases: 600 Num of Controls: 600 Num of Ignored Samples: 1
- Score generated by the expectation maximization algorithm for each iteration. One should observe the score increasing with the number of iterations.
- Results of the Markov Chain Monte Carlo iterations, which include estimation of θ and l. Note that the iteration number goes from 1 – numburn to 0 for the burn-in iterations and from 1 to numiters for the follow-on iterations. Also the score is zero for the burn-in iterations, since we calculate it only for the follow-on iterations. The format of the output is as follows:
estglob theta iter a1 b1 a2 b2 c2
estglob lambda iter p1 lambda1 p2 lambda2 lambdave
These are "global parameters" (affect every individual). See supplementary note 2 of the Patterson et. al. 2004 paper for definitions.
lambdaave is the average λ across individuals.
- Posterior estimates for the mean and standard deviation of θ, θX, λ, λX and t(Afr), t(Eur). The user should look at the value of t(African) and t(European) carefully, since they are an indicator of how well the ancestral models fit the data. It is worrisome if we see these value to be less than 100.
- Genome-wide scores for all the models
- Theta and Lambda estimates with standard error for all the samples
- Allele frequency estimates with standard error for all the markers
##SCORES FROM EXPECTATION_MAXIMIZATION ALGORITHM ITERATIONS
## Iteration_Num Score
emsimple iter: 1 0.000
emsimple iter: 2 76769.270
emsimple iter: 3 99251.402
emsimple iter: 4 106551.592
emsimple iter: 5 109118.490
emsimple iter: 6 110158.959
emsimple iter: 7 110664.478
emsimple iter: 8 110958.509
emsimple iter: 9 111157.265
emsimple iter: 10 111307.354
emsimple iter: 11 111429.499
emsimple iter: 12 111533.788
emsimple iter: 29 112392.785
emsimple iter: 30 112417.570
muval1: 0.000
neil0: 5.927 -3.414
domcm1 time: 25.670
neil1: 7.397 -2.749
###RESULTS FOR EACH MARKOV CHAIN MONTE CARLO ITERATION
##estglob theta: Iteration_Num thp1 thp2 thxp0 thxp1 thxp2
##thp1, thp2: Are parameters for the prior distribution of theta, and thxp0,thxp1,thxp2 are the same for theta on X chromosome
##estglob lambda: Iteration_Num lp1 lp2 lxp1 lxp2 ave_lambda
##lp1, lp2: Are parameters for the prior distribution of lambda, and lxp1,lxp2 are the same for lambda on X chromosome
estglob theta -49 1.918 7.752 1.120 8.242 35.115
estglob lambda -49 12.913 2.482 10.698 2.088 5.203
domcm1 time: 25.670
estglob theta -48 1.977 7.792 1.282 9.432 36.396
estglob lambda -48 15.138 2.884 11.301 2.209 5.249
estglob theta 99 2.052 8.444 3.307 26.913 207.078
estglob lambda 99 16.854 2.854 13.094 3.171 5.935
estglob theta 100 2.013 7.949 3.176 27.833 206.392
estglob lambda 100 17.012 2.851 13.486 3.261 5.932
average thetax: 223.978
###POSTERIOR ESTIMATES
theta mean 0.1988
thetax mean 0.1866
theta var 0.0141 sdev: 0.1187
thetax var 0.0110 sdev: 0.1049
lambda mean 5.9529
lambdax mean 4.4653
lambda var 2.0206 sdev: 1.4215
lambdax var 21.2929 sdev: 4.6144
tau (PopA) 109.716
tau (PopB) 117.473
###GENOME_WIDE SCORE FOR ALL THE MODELS
##risk1 and risk2 are the increased risk due to having one or two population A ancestry alleles, and crisk: risk for controls
risk1 risk2 crisk score
model: 1.500 2.250 1.000 13.591
###THETA or M, LAMBDA VALUES FOR ALL INDIVIDUALS
##Indiv_Index: individual's internal index num, tmean and txmean: average theta and thetax
##tsdev and txsdev: standard deviation for theta and thetax
##lmean and lxmean: average lambda and lambdax
##lsdev and lxsdev: standard deviation for lambda and lambdax
Num Indiv_ID Gender tmean tsdev txmean txsdev lmean lsdev lxmean lxsdev
0 toyindiv:0 M 0.217 0.027 0.210 0.046 5.845 0.496 4.715 1.128
1 toyindiv:1 F 0.107 0.016 0.115 0.037 7.142 0.962 4.489 1.140
2 toyindiv:2 M 0.223 0.023 0.204 0.043 5.332 0.548 5.040 1.221
3 toyindiv:3 F 0.185 0.029 0.187 0.043 3.699 0.565 5.134 1.072
1198 toyindiv:1198 M 0.138 0.018 0.135 0.041 7.699 1.072 4.201 1.094
1199 toyindiv:1199 F 0.159 0.022 0.156 0.043 8.169 1.007 4.301 1.099
###ALLELE FREQUENCY ESTIMATES WITH STANDARD ERROR
##SNP_Index: marker internal index num
##amean and bmean are the average reference allele frequency for population A and B
##asdev and bsdev are the corresponding standard deviation
SNP_Index Chr_Num SNP_ID amean asdev bmean bsdev
0 1 rs819980 0.948 0.006 0.023 0.013
1 1 rs10907185 0.249 0.011 0.683 0.026
4 1 rs897634 0.090 0.007 0.785 0.023
6 1 rs2817159 0.951 0.006 0.071 0.016
5425 23 rs10127175 0.104 0.008 0.049 0.013
5426 23 rs884840 0.984 0.004 0.305 0.029
Here Mu is the genotype risk, and lambda is the allelic risk. For a single copy of a chromosome with local ancestry a and b variant alleles the risk is taken to be exp(a lambda) exp(b mu). In the table shown below, given Mu, lambda is chosen so that the ancestry risk if the allele is unknown is that specified by the risk parameter of the (coarse scan) model, for example the risk here is 1.5 (see Overview section). The LogScore column (clipped, so the score will not be below 0) is a LOD score for the fine-mapping model against the model where genotype does not correspond to risk. Note that a positive LogScore is a hint of a causal allele. The reader, as a check on understanding, should note that if mu = 1, then the score must be 0 also as is true in the tableau below (row 15).
lmbayes is a Bayes factor averaging over all fine mapping markers in the run. This really needs adjusting by a prior for whether there is a causal marker in the region.
### Iteration_Num Mu Log_Score Caltd_Lambda
lmdetails 0 0.333 -8.000 0.811
lmdetails 1 0.359 -8.000 0.839
lmdetails 2 0.386 -8.000 0.868
lmdetails 3 0.415 -8.000 0.900
lmdetails 4 0.447 -8.000 0.934
lmdetails 5 0.481 -8.000 0.970
lmdetails 6 0.517 -8.000 1.009
lmdetails 7 0.557 -8.000 1.050
lmdetails 8 0.599 -7.924 1.094
lmdetails 9 0.644 -6.300 1.141
lmdetails 10 0.693 -4.327 1.192
lmdetails 11 0.746 -2.758 1.246
lmdetails 12 0.803 -1.576 1.304
lmdetails 13 0.864 -0.746 1.365
lmdetails 14 0.929 -0.229 1.430
lmdetails 15 1.000 0.000 1.500
lmdetails 16 1.076 -0.057 1.574
lmdetails 17 1.158 -0.412 1.653
lmdetails 18 1.246 -1.075 1.737
lmdetails 19 1.340 -2.058 1.826
lmdetails 20 1.442 -3.369 1.920
lmdetails 21 1.552 -5.009 2.020
lmdetails 22 1.670 -6.941 2.126
lmdetails 23 1.797 -7.978 2.238
lmdetails 24 1.933 -8.000 2.356
lmdetails 25 2.080 -8.000 2.481
lmdetails 26 2.238 -8.000 2.612
lmdetails 27 2.408 -8.000 2.750
lmdetails 28 2.591 -8.000 2.895
lmdetails 29 2.788 -8.000 3.048
lmdetails 30 3.000 -8.000 3.207
###lmscore: Fine-mapping score in addition to the Admix_Score
##SNP_ID LMScore Chr_Num Phys_Pos Admix_Score
lmscore: rs11890727 -0.992 2 114383724 14.382
##lmscbest : Best lmscore in the run
lmscbest: -0.992
##lmbayes: Bayes factor, averaging over all fine mapping markers in the run
lmbayes: -0.992
- § Lag and correlations
For a number of sample statistics we compute a correlation coefficient at small "lags". If the statistic at iteration i is S(i) we compute for 1 <= lag <= 10 (default) the correlation between S(i) and S(i+lag). Large values indicate that the MCMC is not mixing very well.
We publish this for:- o llike: a statistic of no intrinsic interest but mixes poorly.
- o log10fac: Log_10 Bayes factor (genome wide)
- o factor: Bayes factor = 10^log10fac
- o log tauscal: log (t(0)) the t value for population 0.
In our experience ii), iii) are the most important statistics which mix well, iv) mixes less well and i) mixes quite poorly.
- § Scores for each chromosome
As one can clearly see from the below example, the LGS_MAX and CCS_MAX scores are the highest for chromosome number 3.
- § Bestscores: The maximum genome-wide score for the locus-genome statistic, and the maximum and minimum genome-wide scores for the case-control statistic.
- § Genome-log-factor: log-likelihood of the locus genome statistic averaged over all the markers in the genome.
The genome-log factor is the most important number that is produced by the program and should be the first number that the user looks at.
###LAG AND CORRELATIONS
llike mean: -32402.892 s.err: 1965.995
lag: 1 corr: 0.629 sig: 6.258
lag: 2 corr: 0.528 sig: 5.224
lag: 3 corr: 0.462 sig: 4.550
lag: 4 corr: 0.440 sig: 4.312
lag: 5 corr: 0.365 sig: 3.560
lag: 6 corr: 0.332 sig: 3.221
lag: 9 corr: 0.068 sig: 0.647
lag: 10 corr: 0.230 sig: 2.178
###SCORES FOR EACH CHROMOSOME
##LGS_MAX: Maximum locus genome statistic score
##CCS_MAX and CCS_MIN are the maximum and minimum case control statistic scores
##LGS_LOCAL: log likelihood of the locus genome statistic score obtained by averaging over all the markers on that chromosome
Chr_Num LGS_MAX CCS_MAX CCS_MIN LGS_LOCAL
1 -0.49 1.79 -2.22 -2.20
2 16.55 7.31 -1.05 14.73
3 -2.94 1.68 -0.79 -4.27
4 -2.89 1.78 -1.81 -4.25
5 -3.31 1.27 -2.11 -4.59
6 -2.86 0.59 -2.22 -4.15
7 -1.01 2.29 -0.78 -2.37
8 -3.82 0.30 -2.48 -4.90
9 -1.65 1.17 -1.12 -3.04
10 -3.24 1.31 -1.93 -4.56
11 -2.15 2.34 -1.20 -3.24
12 1.43 2.23 -2.17 0.02
13 -1.31 2.40 -1.28 -2.30
14 -1.16 1.42 -1.41 -2.41
15 -2.78 0.37 -2.07 -4.13
16 -2.65 0.31 -2.35 -3.67
17 -0.28 0.96 -0.98 -1.69
18 -1.40 2.23 -2.90 -2.61
19 -0.83 2.38 -0.96 -2.51
20 -0.34 1.00 -0.42 -2.00
21 -1.65 3.00 -0.20 -3.20
22 -3.56 2.16 -1.13 -4.41
23 -1.01 2.46 -1.64 -1.96
###BESTSCORES: Maximum genome-wide score for the locus-genome statistic (LGS_MAX), and the maximum and minimum genome-wide scores fo
r the case-control statistic (CCS_MAX and CCS_MIN)
bestscores: 16.554 7.308 -2.897
###GENOME LOG FACTOR: log-likelihood of the locus genome statistic averaged over all the markers in the genome
genome log-factor: 13.591
##end of run