Section 3 | David Reich Lab

3. How to run the program

This section describes how to run the program through the command line, and a description of the input parameter file needed to run it.

3.1 Command line arguments

To run the program type on the command line:

>> ancestrymap –pv paramfile or

>>./ancestrymap –pv paramfile

p: is a compulsory option, and in this case we have to specify the parameter file paramfile.

v: version number, this tells us which version of the program we are using. This number can be modified by the user in the file ancestrymap.c

To redirect the output to a file one would type on the command line:

>>./ancestrymap –pv paramfile > out.dat&

3.2 Description of the parameter file

The format of this file is as follows:

Parname: parvalue

>>seed: 200

Note: All the parameter names should be in lowercase, and there should be no white space between parname and semicolon. The parameters which are compulsory are the names of the files that contain marker, individual and genotype data; and the risk model. Parameters which are of the type array should have their values space separated. A sample parameter file is included as part of the download, and a detailed description of the parameters is as follows:

Parameter Name	Data type	Description	Possible and Default values
INPUT FILE NAMES
indivname (MANDATORY)	String	Individual data
badsnpname	String	List of markers to delete from analysis
genotypename (MANDATORY)	String	Genotype data for all the samples
snpname (MANDATORY)	String	Marker data
ANCESTRYMAP PARAMETERS
risk (MANDATORY)	Double array	Risks for the various models	Default: 2.0
numiters	Int	Number of follow-on iterations	Positive integer >= 0 Default: 5
numburn	Int	Number of burn-in iterations	positive integer >= 0 Default: 1
reestiter	Int	Controls number of iterations inside ancestrymap for allele freq sampling	positive integer >= 1 Default: 1
details	Boolean	If YES generate additional output	NO, YES Default: NO
tlreest	Int	Always set to YES, don't need it	0,1
noxdata	Boolean	If you have no X chromosome data or want to ignore it	NO, YES Default: NO
fakespacing	Double	The spacing between fake markers in Morgans	positive > 0 Default: 0.01 ( in Morgans)
seed	Int	Random number needed for the run	Positive integer
checkit	Boolean	If YES runs lots of checks (mostly done initially)	NO, YES Default: NO
thxpars	Double array of size 3	Sets the initial parameters for the prior distribution for θ_X	Default: 40.0 1.0 10.0
thpars	Double array of size 2	Sets the initial parameters for the prior distribution for θ.	Default: 1.0 5.0
lampars	Double array of size 2	Sets the initial parameters for the prior distribution for λ.	Default: 1.0 0.1
lamxpars	Double array of size 2	Sets the initial parameters for the prior distribution for λ_X	Default: 1.0 0.1
dotoysim	Boolean	If YES run simulations	NO, YES Default: NO
markersim	Int	This is the marker number of the disease allele, -1 means none	-1 or positive integer Default: -1
simnumindivs	Int	Generate toy data with simnumindivs number, half will be cases, and half controls. Half are female and half are male	Positive integer Default: -1
risksim	Double	In simulation mode risk used to generate data	Default: 1.0
tauscal	Double array	Initial values of t(African) & t(European)	Default: 100 100 (Note this is a lower value than we expect, however we prefer to bias the initial value to be low)
wrisk	Double array	Allows the model to have weights, which are normalized to sum to 1	Default: 1.0
lrisk	Double	In checkit mode: leave one marker out in turn and this is the risk that we use (in checkit mode: only one model risk is used).	Default: -1.0
controlrisk	Double array	Control risks for the various models	Default: 1.0
risk2	Double array	Risk for ethnic homozygotes for various models, controlrisk and risk2 are optional, however they should be same number as risk if they are specified	Default : -1.0
taulsdev	Double	Prior standard deviation for african & European t values	Default: 0.5
taulmean	Double	Prior mean for log10(t) for both African and European	Default: 2.0
allmale	Boolean	Used in simulation mode. If YES it specifies that all the simulated individuals should be men. Need to specify the parameter simnumindivs to make this parameter effective	0,1 Default: NO
allcases	Boolean	If YES all the samples are cases	NO, YES Default: NO
usecontrols	Boolean	If NO controls are ignored	NO, YES Default: YES
pubfmodern	Boolean	Publish ancestral allele frequency estimates, if YES allows publication of modern allele frequencies	NO, YES Default: NO
OUTPUT FILE NAMES (Note that the directory in which the output files are to be generated should exist, else the program will fail)
trashdir	String	Used only in checkit mode: directory to store HMM output
thetafilename	String	Ancestry information for all individuals
output	String	Parameter values at every iteration
pubxname	String	Debug file for a particular marker
ethnicfilename	String	Average ethnicity (/g) for each marker, averaged over all individuals and iterations
snpoutfilename	String	Detailed marker information
indoutfilename	String	Detailed individual information
freqfilename	String	Allele frequency information for all markers
lambdafilename	String	λ information for all individuals
genotoyoutfilename	String	Genotype data generated in simulation mode
indtoyoutfilename	String	Individual data generated in simulation mode

The software makes it possible to test for several disease models simultaneously. If one is studying a disease for which there is an epidemiological reason to believe that there is higher genetic risk in population A, one might want to test several models for increased risk due to population A ancestry and, simultaneously test one model where population B ancestry confers greater risk. This is implemented by inputting the parameter risk as an array with values both greater and less than 1, for example:

>>risk: 0.8 1.2 1.3 1.4 1.5 1.6