This page provides files with derived allele frequency (DAF) data in the 4 HapMap populations with ascertainment defined by SNPs discovered as polymorphic within an individuals two chromosomes, between single copies of two different individual's chromosomes, or between two random, unrelated chromosomes from within a population group. This ascertainment was applied to 7 different individuals. For more, see the following two papers: (1) Keinan et al. "Accelerated genetic drift on chromosome X during the human dispersal out of Africa", Nature Genetics (2008). (2) Keinan et al. "Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans", Nature Genetics 39, 1521-5 (2007)

Individuals
-----------
The individual identifiers starting with Cor come from the Coriell repository and the number is the sample identification number, e.g. Cor11321 is: http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=GM11321 and Cor7340 is: http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=GM07340. There are two libraries that originate from the Celera Genomics (Science Vol 291, page 1304) and are individuals A (HuAA) and F (HuFF) of Table 1 on page 1307.

All of the Cor libraries were derived from flow sorted chromosomes, thus this is the breakdown of autosomes and chrX covered for each library:

Cor10470: chr20 chr22

Cor7340:  chr1 chr6 chr9 chr10 chr11 chr12 chr13 chr20 chr22 chrX

Cor11321: chr1 chr6 chr9 chr10 chr11 chr12 chr13 chr20 chr22 chrX

Cor17119: chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr20 chr21 chr22 chrX

Cor17109: chr1 chr16 chr17 chr18 chr19

For the Hu libraries, these were generated with a whole genome shotgun method, thus coverage should not be chromosome biased.  More details about these libraries can be found in Supplementary Table 5 of Keinan et al. (2007).

Files
-----

For each individual, DAF files are provided for autosomal and chrX data, following all data corrections. 

Where available, files based on HapMap's chr2p pilot are also provided, in which only HapMap Phase 2 genotype information was considered (all these SNPs have code 002; see below).  
Population groups are denoted YRI for Yorubas, AsAm for Asian Americans, AfAm for African Americans, and EuAm for European Americans.  Between individual sets use a hyphen between the individuals, e.g. HuFF-HuAA are those SNPs determined from the comparison of these two individuals.

Note: the file for YRI (DAF.chrX.b121.YRI) is based on genotyped performed in our laboratory (see Keinan et al. Nature Genetics 2008 for more). Hence, some of the fields are irrelevant and indicated as NA.

File format
-----------
Each row in the files represents one SNP and consists of the following tab-delimited columns:

* chr                  Chromosome number
* pos                  Physical position on chromosome (NCBI Human reference build 35)
* individual
* rsID  
* AncBase              Ancestral allele (alleles are relative to the top strand of NCBI Human referecen build 35)
* DerBase              Derived allele
* CEU.An               Ancestral allele count for the HapMap CEU sample
* CEU.Dn               Derived allele count for the HapMap CEU sample
* CEU.DAF              CEU.Dn/(CEU.An+CEU.Dn)
* same three columns for CHB, JPT and YRI **
* Code                 Accounts for the status of the SNP in the dataset. It has three digits: 
                               (1) 1 if a proxy SNP was considered for this SNP since the SNP was note genotyped based on information from Hinds et al. (2005); 0 otherwise. 
                               (2) If proxy, 1 denotes proxy due to the r^2=1 criterion, while 5 denotes a proxy due to being of minor allele frequency <5% in all Hinds et al. (2005) samples; 0 if not a proxy.
                               (3) 0 for genotype information from Hinds et al. (2005); 1 for HapMap Phase 1 genotype information; 2 for HapMap Phase 2.
* Substitute   For Code=111 or Code=112, this column provides the rsID of the HapMap SNP determined as proxy.

Filtering level
---------------
Each file is available at 3 filtering levels, under three separate directories:
level_1 : basic filtering (CpGs and filters to unbias data set; see Keinan et al. (2007) for more)
level_2 : basic filtering as in level_1, as well as filtering exons and conserved non-coding sequence (see Keinan et al. 2008 for more)
level_3 : same as level_2, but also filtering putative selective sweeps according to Sabeti et al. Nature (2007).

(Note: our own genotyping (DAF.chrX.b121.YRI) is available only under level_3 since all these filters were applied before choosing SNPs for genotyping)
