DNA sequence alignments used in Patterson et al. Nature 2006

“Genetic evidence for complex speciation of human and chimpanzees”
Patterson N, Richter DJ, Gnerre S, Lander ES and Reich D; Nature 2006

Sequence obtained for this study: We sequenced 117,862 reads of DNA: 115, 152 from a western lowland gorilla (Gorilla gorilla, individual NG05251 in the Coriell catalog: locus.umdnj.edu/primates/species_summ.html) and 2,710 from a black-handed spider monkey (Ateles geoffryi, individual NG05352). All sequencing reads are publicly available at the NCBI trace archive (http://www.ncbi.nlm.nih.gov/Traces); to access them, carry out the following queries:

(1) Gorilla data (Gorilla gorilla):

CENTER_NAME='WIBR' and CENTER_PROJECT='G611'
CENTER_NAME='WIBR' and CENTER_PROJECT='G612'
CENTER_NAME='WIBR' and CENTER_PROJECT='G618'
CENTER_NAME='WIBR' and CENTER_PROJECT='G619'
CENTER_NAME='WIBR' and CENTER_PROJECT='G744'

(2) New world monkey data (Ateles geoffroyi)
CENTER_NAME='WIBR' and CENTER_PROJECT='G820'

We note that the NCBI trace archive contains slightly more reads that we report in our analyses, because not every read submitted to the Trace Archive passed standard pre-filtering steps.

Alignments: The alignments of humans, chimpanzees, gorillas, and more distantly related primates can be downloaded below or online at Nature. The first two data sets are packaged into “tar” files. When opened with the unix command “tar -xvf name", these expand into many files: one for each alignment. The third and fourth data sets, corresponding to alignments of contiguous sequence, are in Threaded Block Set aligner (tba) format, and are packaged into “gz” files. These can be opened with the unix command “gunzip name".

HCGOM shotgun data
hcgom_aligns.tar
33,016 alignments

HCGM shotgun data
hcgm_aligns.tar
51,966 alignments

HCGOM contiguous chr. 7
hcgom7_contig_aligns.tba.gz
1 contiguous alignment

HCGOM contiguous chr. X
hcgomX_contig_aligns.tba.gz
1 contiguous alignment

Data sets: The filtered data can be accessed below or online at Nature. Data are packaged into “gz” files, which can be opened with the unix command “gunzip name".

HCGOM shotgun data
hcgom_shotgun.gz
498,771 divergent sites

HCGM shotgun data
hcgm_shotgun.gz
858,941 divergent sites

HCGOM contiguous chr. 7
hcgom7_contig.gz
69,521 divergent sites

HCGOM contiguous chr. X
hcgomX_contig.gz
8,769 divergent sites

Please contact David Reich (reich at genetics.med.harvard.edu) for any further clarification about this data.