|Assembled from components at Clker clipart.|
The current chimpanzee genome assembly has problems that reduce its veracity as an authentic representation. First, it has been assembled using the human genome as a reference scaffold and does not stand on its own merits. Second, given the fact that significant levels of human DNA exist in non-primate databases due to laboratory and worker contamination, the potential for human DNA in the pre-assembled chimpanzee sequencing reads is highly probable. Therefore, 101 Sanger-style publically available trace read data sets were downloaded, end-trimmed for low quality bases, and purged of vector sequence. Then, 25,000 sequences were selected at random from each of the 101 data sets and queried against the human genome using BLASTN v2.2.31 with gap extension. Results from the BLASTN analysis indicated that two different groups of chimpanzee DNA sequences could be found. Those that were completed early in the chimpanzee genome project that contributed to the initial 5-fold draft genome, were considerably more similar to human than those that were produced later in the project by a difference of about 7% overall data set identity and produced 6% less hits onto the human genome. Sequences (both alignable and non-alignable) from the seemingly less contaminated data sets indicate that the chimpanzee genome is approximately 85% identical overall to human. Extensive poor alignment of chimpanzee DNA sequences that did not have hits on the human genome that were blasted on the chimpanzee genome revealed regions of miss-assembly for the chimpanzee genome.To read the rest (perhaps saving the link for reference), click on "Analysis of 101 Chimpanzee Trace Read Data Sets: Assessment of Their Overall Similarity to Human and Possible Contamination With Human DNA".