Fine Mapping Run Setup

8.2 Fine-Mapping Run SetUp

Once you have successfully run the coarse scan and obtain a peak local log factor > 4 consistently, one can then proceed to “bombard” the region with more SNPs for the same samples in order to find a causal allele. Once you have the genotype data for that you can proceed to use the fine-mapping part of the software, which will give the extra allelic risk on top of ancestry risk. In order to do this properly, one must make sure NOT to include the fine-mapping SNPs in the main coarse scan run. An easy way of doing this is to include the fine-mapping SNPs in the badsnp file. The reason for this is that usually one would have chosen a number of SNPs very close to each other on the same chromosome. These markers are usually in linkage disequilibrium with each other, and as already discussed in Section  , SNPs in LD can lead to false positive scores. After doing the coarse scan, identify the peak region and then start a few runs around the peak region using the fine-mapping parameter file. 

It is important that a fine mapping SNP is not in LD with a framework SNP in the parental populations. It is a little complicated to achieve this, we do it by making a series of runs in which we fine map over say a megabase region with no framework SNP in the region or very close by. We provide an annotated Perl script mkfine, and an accompanying template parameter file which the user should be able to modify for his/her requirements. The script mkfine has a number of parameters which the user will need to set depending on their individual scan. It is important to read the script carefully before starting to use it.

#!/usr/local/bin/perl  -w

## script to setup fine mapping runs

$D="/home/at55/ancestrymap-rel/exampletry" ; ##Directory location: PLEASE MODIFY

$ABIN = "/home/at55/ancestrymap-rel/bin"; ##Directory location: PLEASE MODIFY

$SCL = "$D/outfiles/local.out" ;  ## This file has the coarse scan scores

$chrom = 2 ;  ##The chromosome on which we are doing the fine mapping

$mlo = 112000000 ;  ## The lower physical position boundary

$mhi = 126000000 ; ## The upper physical position boundary

$OLDBADL = "$D/badsnps" ; ## The file with the bad SNPs that we discarded in the coarse scan run

$PARFILE = "parfine.templ" ; ## The parameter template file used to set up the fine mapping runs

$VERYBAD = "NULL" ; ##The list of VERY bad SNPs that we don’t want in our analysis at all

$SNPL = "$D/snpcnts" ; ## The snpcnts file that we used in the coarse scan run

$TAGBASE = 8000 ; ## The base number used to create the output files

##Output Files

$FRL = "framelist1" ; 

$BADL = "badlist1" ; 

$TT1= "/tmp/tt1.$$" ; 

$TT2= "/tmp/tt2.$$" ; 

open (FF, $SCL) || die "can't open $SCL\n" ;

open (YY, ">$FRL") || die "can't open $FRL\n" ;

foreach $line(<FF>) {

 ($a, $b, $c, $d) = split " ", $line ;

 next unless (defined $d) ;

 next if ($a =~ /\#/) ;

 next unless ($b == $chrom) ;

 next if ($c =~ /fake/) ;

 next if ($d<$mlo) ;

 next if ($d>$mhi) ;

 printf YY "%25s %12d\n", $c, $d ;

}

close FF ;

close YY ;

open (FF, $SNPL) || die "can't open $SNPL\n" ;

open (YY, ">$TT1") || die "can't open $TT1\n" ;

foreach $line(<FF>) {

 ($a, $b, $c, $d) = split " ", $line ;

 next unless (defined $d) ;

 next if ($a =~ /\#/) ;

 next unless ($b == $chrom) ;

 next if ($d<$mlo) ;

 next if ($d>$mhi) ;

 print YY "$a $d\n" ;

}

close FF ;

close YY ;

system "cp $TT1 $TT2" ;

system "$ABIN/uniqit -i $TT1 -o $TT2 -u $VERYBAD -v " if ($VERYBAD ne "NULL") ;;

system "$ABIN/uniqit -i $OLDBADL -o $BADL -u $TT2 -v" ;

$SNPF  = "$SNPL" ;          

$DELF  = "$TT2" ;

$FRAMEF  = "$FRL" ;

$BADF  = "$BADL" ;

## old mcmc had small bug

## lmchrom must be set correctly in parfine.templ

## strategy:

## for each framework marker set c to be its position

## a, e position of framework markers on either side.

## Set b = (a+c)/2

## Set d = (c+e)/2

## we now have a  b  c  d  e 

## we set markers in b:d as "real fakes"  they get 2D scored

## we set markers in a:e\b:d as ignored                         

$T0 = "/tmp/t0.$$" ;

$T1 = "/tmp/t1.$$" ;

$T2 = "/tmp/t2.$$" ;

$T3 = "/tmp/t3.$$" ;

$T4 = "/tmp/t4.$$" ;

$T5 = "/tmp/t5.$$" ;

$T6 = "/tmp/t6.$$" ;

$T7 = "/tmp/t7.$$" ;

## $T0 = "t0" ;

## $T1 = "t1" ;

## $T2 = "t2" ;

## $T3 = "t3" ;

## $T4 = "t4" ;

## $T5 = "t5" ;

## $T6 = "t6" ;

## $T7 = "t7" ;

### step 1 get list of markers in peak. 

system "$ABIN/uniqit -i $DELF -o $T0 -u $FRAMEF -v" ;

open (FF, $SNPF) || die "can't open $SNPF\n" ;

$a=$b=$c=$d = 0 ;

$x = 0 ;

foreach $line (<FF>) {

($a, $b, $c, $d) = split " ", $line ;

next unless (defined $d) ;

next if ($a =~ /\#/)  ;

next unless  ($b == $chrom) ;

next unless  ($d >= $mlo) ;

next unless  ($d <= $mhi) ;

##$M{$a} = $x ;

$NAME[$x] = $a ;

$POS[$x] = $d ;

++$x ;

}

$peaknum = $x ;

printf "## number of markers in peak: %d\n", $peaknum ;

### step 2 get framelist

close FF ;

open (FF, $FRAMEF) || die "can't open $FRAMEF\n" ;

$x = 0 ;

foreach $line (<FF>) {

($a, $b) = split " ", $line ;

next unless (defined $a) ;

next if ($a =~ /\#/) ;

@Z = split " ", $line ;

$nz = @Z ;

if ($nz>3) {

 $b = $Z[3] ;

}

## alternate format (snpcnts)

++$x ;

$FNAME[$x] = $a ;

$FPOS[$x] = $b;

}

$FPOS[0] = $mlo ;

$FPOS[$x+1] = $mhi ;

foreach $xx (1..$x) {

 $TAG = $TAGBASE+$xx ;

 $a = $FPOS[$xx-1] ;

 $e = $FPOS[$xx+1] ;

 $c = $FPOS[$xx]  ;

 $cname = $FNAME[$xx] ;  ## middle marker

 $lmlo = $b = int ( ($a+$c)/2) ;

## printf "zz $a :: $c :: $b\n" ;

 $lmhi = $d = int ( ($c+$e)/2) ;

 mklist($a, $e+1, "$T1") ;

 mklist($b, $d, "$T2") ;

 system "cat $T0 $T1 > $T4" ;

 system "$ABIN/uniqit -i $T4 -o $T3 -u $T2 -v" ;  ## (DLIST + aelist) \ cdlist

 system "$ABIN/addcol \"fake\" < $T2 > $T5" ;    ## add realfakes 

 system "cat $T3 $T5 >$T6" ;   ## basic list of bad markers

 system "$ABIN/uniqit -i $T6 -o $T7 -u $BADF -v " ;  ## but purge BADF markers

 system "cat $BADF $T7 > $D/badlist:$TAG" ;

 dosub ($PARFILE, "par:$TAG") ;

 if ( ($c > $lmlo) && ($c < $lmhi)) {

   ##This command runs the ANCESTRYMAP executable on our LSF queue..PLEASE MODIFY

   system "bsub -o xx:$TAG -q reich_12h  $ABIN/ancestrymap -p par:$TAG " ;

 }

 else {

  print STDERR "par:$TAG not submitted $c $lmlo $lmhi\n" ;

 }

}

unlink ($T0, $T1, $T2, $T3, $T4, $T5, $T6, $T7) ;

sub mklist {

## write to fname all markers between lo, hi

 my ($lo, $hi, $fname) = @_ ;

 local ($x, $num) ;

 open (YY, ">$fname") || die "can't open $fname\n" ;

 $num = 0 ;

 foreach $x (0..$peaknum-1) { 

  $pos = $POS[$x] ;

  next if ($pos<$lo) ;

  next if ($pos>=$hi) ;

## lack of symmetry deliberate.  Forces non-overlapping intervals

  print YY "$NAME[$x]\n" ;

  ++$num ;

 }

 close YY ;

## printf "##mklist  $lo $hi $num\n" ;

}

sub dosub {

 my ($IN, $OUT) = @_ ;

 $str = "" ; 

 $str = addx($str,"TAGVAL", $TAG) ;

 $str = addx($str,"LMLO",  $lmlo) ;

 $str = addx($str,"LMHI",  $lmhi) ;

 if ( ($c > $lmlo) && ($c < $lmhi)) {

  $str = addx($str,"MNAME", $cname) ;

 }

 $cmd = "m4 $str < $IN >$OUT" ;

 system "$cmd" ;

}

sub addx {

 my ($ss, $x, $y)  = @_ ; 

  $ss .= " -D \"$x=$y\" " ;

   return $ss ;

}

The accompanying template parameter file:

##Parameters from the coarse scan run

risk:     1.5

DIR:    /home/at55/ancestrymap-rel/exampletry

TAG:     TAGVAL

indivname:    DIR/indiv1.dat

snpname:      DIR/snpcnts

genotypename: DIR/geno.dat

badsnpname:   DIR/badlist:TAG

fakespacing:  .01 

tlreest:  YES

OUTD:    DIR/outfiles 

seed:  TAG  

splittau:  YES

fancyxtheta:  YES

output:         OUTD/outlm:TAG

trashdir:       /home/at55/trashdir

checkit:  NO

details:  YES

numburn:  50

numiters: 100

emiter:   30

dotoysim:  NO

cleaninit: YES

reestiter:  5

indoutfilename:    NULL

snpoutfilename:    OUTD/snps:TAG

localoutfilename:  OUTD/details:TAG

##The fine-mapping parameters

lmmodel:    YES

##This is the chromosome number for which we are doing fine-mapping, CHANGE

lmchrom:      2

##This is the mesh size

lmnumx:      100

##This is the maximum

lmmax:      6.0

lmthresh:   0.0 

lmdetails:  YES

lmlobase:   LMLO      

lmhibase:   LMHI     

oldlmmode:  NO  

markername: MNAME 

pubxname:   OUTD/gams:TAG

##This should be changed as per how high is the peak local score in the coarse scan

hiclip: 20

Once you have set the various parameters in the 2 files, you can just run the script mkfine by typing on the command line:

>> perl mkfine

This should start the various ANCESTRYMAP runs in the fine-mapping mode, once all the runs are over you can take a look at the output files to see if there is anything interesting.