Section 3

3. How to run the program

This section describes how to run the program through the command line, and a description of the input parameter file needed to run it.

3.1 Command line arguments

To run the program type on the command line:

>> ancestrymap –pv paramfile or

>>./ancestrymap –pv paramfile

p: is a compulsory option, and in this case we have to specify the parameter file paramfile.

v: version number, this tells us which version of the program we are using. This number can be modified by the user in the file ancestrymap.c

To redirect the output to a file one would type on the command line:

>>./ancestrymap –pv paramfile > out.dat&

3.2 Description of the parameter file

The format of this file is as follows:

Parname: parvalue

>>seed: 200

Note: All the parameter names should be in lowercase, and there should be no white space between parname and semicolon. The parameters which are compulsory are the names of the files that contain marker, individual and genotype data; and the risk model. Parameters which are of the type array should have their values space separated. A sample parameter file is included as part of the download, and a detailed description of the parameters is as follows:

Parameter Name

Data type

Description

Possible and Default values

INPUT FILE NAMES

indivname

(MANDATORY)

String

Individual data

 

badsnpname

String

List of markers to delete from analysis

 

genotypename

(MANDATORY)

String

Genotype data for all the samples

 

snpname

(MANDATORY)

String

Marker data

 

ANCESTRYMAP PARAMETERS

risk

(MANDATORY)

Double array

Risks for the various models

Default: 2.0

numiters

Int

Number of follow-on iterations

Positive integer >= 0

Default: 5

numburn

Int

Number of burn-in iterations

positive integer >= 0

Default: 1

reestiter

Int

Controls number of iterations inside ancestrymap for allele freq sampling

positive integer >= 1

Default: 1

details

Boolean

If YES generate additional output

NO, YES

Default: NO

tlreest

Int

Always set to YES, don't need it

0,1

noxdata

Boolean

If you have no X chromosome data or want to ignore it

NO, YES

Default: NO

fakespacing

Double

The spacing between fake markers in Morgans

positive > 0

Default: 0.01 ( in Morgans)

seed

Int

Random number needed for the run

Positive integer

checkit

Boolean

If YES runs lots of checks (mostly done initially)

NO, YES

Default: NO

thxpars

Double array of size 3

Sets the initial parameters for the prior distribution for θX

Default:

40.0 1.0 10.0

thpars

Double array of size 2

Sets the initial parameters for the prior distribution for θ.

Default:

1.0 5.0

lampars

Double array of size 2

Sets the initial parameters for the prior distribution for λ.

Default:

1.0 0.1

lamxpars

Double array of size 2

Sets the initial parameters for the prior distribution for λX

Default:

1.0 0.1

dotoysim

Boolean

If YES run simulations

NO, YES

Default: NO

markersim

Int

This is the marker number of the disease allele, -1 means none

-1 or positive integer

Default: -1

simnumindivs

Int

Generate toy data with simnumindivs number, half will be cases, and half controls. Half are female and half are male

Positive integer

Default: -1

risksim

Double

In simulation mode risk used to generate data

Default: 1.0

tauscal

Double array

Initial values of t(African) & t(European)

Default: 100 100

(Note this is a lower value than we expect, however we prefer to bias the initial value to be low)

wrisk

Double array

Allows the model to have weights, which are normalized to sum to 1

Default: 1.0

lrisk

Double

In checkit mode: leave one marker out in turn and this is the risk that we use (in checkit mode: only one model risk is used).

Default: -1.0

controlrisk

Double array

Control risks for the various models

Default: 1.0

risk2

Double array

Risk for ethnic homozygotes for various models, controlrisk and risk2 are optional, however they should be same number as risk if they are specified

Default : -1.0

taulsdev

Double

Prior standard deviation for african & European t values

Default: 0.5

taulmean

Double

Prior mean for log10(t) for both African and European

Default: 2.0

allmale

Boolean

Used in simulation mode. If YES it specifies that all the simulated individuals should be men. Need to specify the parameter simnumindivs to make this parameter effective

0,1

Default: NO

allcases

Boolean

If YES all the samples are cases

NO, YES

Default: NO

usecontrols

Boolean

If NO controls are ignored

NO, YES

Default: YES

pubfmodern

Boolean

Publish ancestral allele frequency estimates, if YES allows publication of modern allele frequencies

NO, YES

Default: NO

OUTPUT FILE NAMES

(Note that the directory in which the output files are to be generated should exist, else the program will fail)

trashdir

String

Used only in checkit mode: directory to store HMM output

 

thetafilename

String

Ancestry information for all individuals

 

output

String

Parameter values at every iteration

 

pubxname

String

Debug file for a particular marker

 

ethnicfilename

String

Average ethnicity (/g) for each marker, averaged over all individuals and iterations

 

snpoutfilename

String

Detailed marker information

 

indoutfilename

String

Detailed individual information

 

freqfilename

String

Allele frequency information for all markers

 

lambdafilename

String

λ information for all individuals

 

genotoyoutfilename

String

Genotype data generated in simulation mode

 

indtoyoutfilename

String

Individual data generated in simulation mode

 

The software makes it possible to test for several disease models simultaneously. If one is studying a disease for which there is an epidemiological reason to believe that there is higher genetic risk in population A, one might want to test several models for increased risk due to population A ancestry and, simultaneously test one model where population B ancestry confers greater risk. This is implemented by inputting the parameter risk as an array with values both greater and less than 1, for example:

>>risk: 0.8 1.2 1.3 1.4 1.5 1.6