Posts Tagged ‘zoology’

This recent paper (An update on DNA barcoding: low species coverage and numerous unidentified sequences; published in Cladistics) on an update of the Global DNA barcoding effort should be a real eye-opener to all people who love the NCBI Genbank and the process and openness of science, and especially to taxonomists.

DNA sequence based identification of organisms started during the 1980’s and is still an ongoing process. It is based on an idea that:

  1. If a hitherto identified specimen or organism gets its DNA portion sequenced and is made publicly accessible
  2. Other researchers could sequence their samples and check against the database to identify their sample, provided this second researcher lacks taxonomic expertise.

However this necessitates that the first researcher to know how to identify the specimen unambiguously.

Idea is old, but the name is new!

Recently during the early half of the last decade an international effort to “barcode” all organisms on earth has started based on the above said idea, which in turn is based on years of fine tuning by biologists and computer scientists (who developed BLAST and similar applications).

These researchers propose that sequencing a 650 base pair long region of the mitochondrial DNA could hold good to identify all the animals due to the peculiarities of the sequence. They claim to be the first ones to develop the idea, ignoring the efforts by earlier researchers, and their followers say that they have a “father of DNA barcoding”. I agree that they were the first ones to propose the NAME, but I wonder how it could be their NOVEL idea when the original BLAST algorithm (proposed in 1990) and the idea of sequence similarity was there already before this “barcoding” business.

Let’s come to the point

So the paper published in cladistics, looks at the claims of these “barcoders” and find some problems. They check whether:

  1. This project lived up to its initial speech act? (species coverage problem)
  2. Is it progressing scientifically? (“taxonomy” wise is it 100% percent right?)

Well, the answers are in the negative.

They find ~60,000 “metazoa” species’ barcodes in the NCBI database, which is well below the number of 10-20 million total species on earth (some claims are less but see the link). This is despite having substantial funding from the governments for the barcoding initiative. This paper says that they (Barcoding consortium) received $80 million from the Canadian government, we know about many other sources where every small barcoder gets tens of millions.

They (in this paper) looked for the keyword “barcoding” in the genbank records (of COI sequences) and remove all the COI records with that keyword, and find that only 16,000 (species) records get reduced from the list of 60,000 (species numbers not total COI records). This means that the rest are sequenced by general systematics projects and most probably not funded by any barcoding initiative.

Fishes and Birds had to be completely barcoded by 2012, according to their initial proposal, however when we look in the fish-bol website they say that barcoding for ~8500 have been completed, out of the ~31000 species in total. In the case of Fishes only ~4200 species are present in NCBI, so they have closed access to almost 4000 species.

The second distressing finding is that there are many “unidentified species” in the NCBI records. Out of 5,71,997 COI records in NCBI only 26% had proper names, or were identified up to the species level. That means a very high number of 74% were not identified to species level, so 3/4th of the barcodes produced are useless and squanders public money right*?

The paper highlights a case where a record of Diptera sp., has 1000 sequences with a genetic distance of 1% or less in the NCBI, which was produced by barcoding projects, what a waste of public money.

Readers of zoospooks are also requested to read that blog by Roderic M. Page, to understand the problem of having sequences without proper scientific names in public databases, and to get the idea about what these sequences without names means and how it is found out. He is one of the biggest scientists in my field and I am just a budding blogger/scientist, thus you would benefit better by reading his blog.

In short, DNA barcoding has performed below par, and their quest to barcode all species has failed at least until now. The main problems could be that they did not have trained taxonomists in their ranks. They are against taxonomy using morphological identification, thus these taxonomists distance themselves from barcoding, and barcoders know little taxonomy to correctly identify a species to its specific level. If barcoders say that they found cryptic diversity that was deposited as “sp.” in databases, then why 1000 specimens (with <1% identity), and I would also ask those people to read better about species delimitation methods.

To save itself, Barcoding needs

  1. Proper taxonomists (with proven credential) in each and every project (even if small) that they initiate.
  2. Deposit photographs of ALL the “barcoded” specimen in their website, individual researchers’ website and public access.
  3. Barcoders should put all their data in NCBI or make BOLD open access.
  4. Unwanted sequence deposition should be avoided (un-identified species).
  5. Sequencing unidentified specimen should be discouraged.

These are mere suggestions, by me, but for barcoding to be useful for public they need to clean up a lot, (1) use proper expertise and (2) open up their data and try for another 5 years and lets see what changes from this initial 5 year phase of their project. Regarding the title of this post, barcoding unidentified specimen and introducing errors to a precious database like NCBI should be discouraged and barcoders should understand that although it is a “people’s” choice technology, it has certain responsibilities towards the society and fellow scientists. Indeed I agree that it is very much useful to catalog the biodiversity, I also suggest that it should be done in a better way and in an open manner so that more people benefit and less human effort is lost. Also read my post on the new Pristolepis to see what happens when bad taxonomy and sequencing technology join forces.

(*This is my opinion and has nothing to do with the paper cited)


Today Zootaxa, the mega journal of Zoological Systematics, published the details of a new Pristolepis fish from the Western Ghats biodiversity hotspot. The new fish is named as Pristolepis rubripinnis, and as the name suggests has “red-orange” shade on it “fins”.  The authors have put in a lot of detail in describing this species and is a must read for naturalists, students and researchers with an interest in the ichthyo-diversity of the Western Ghats.

Pristolepis rubripinnis

In this age when science means just hunting for Impact Factors, scientists often resort to tell the “story behind their publications” elsewhere, as seen in the TreeOfLife blog. I think that scientists really enjoy the process of science and that it is a real motivation for many scientists (i.e., to follow the process after a hypothesis is formulated). However, this pleasure and the process and details is not always evident while reading majority of the scientific publications.

Whereas when you read a real TAXONOMIC work you really read the hypothesis, the process, it is a beauty. Here it started after finding a “marked colour variant of Pristolepis” and recollecting the confusion in the taxonomic literature about Pristolepis species of Western Ghats, which helped them to formlate a hypothesis seeking to answer the question “is it a species new to science”? and the answer was YES!!!!!

Earlier in 1849 Jerdon had described the first ever Pristolepis species from North Kerala “above the Palakkad Gap” as this paper says. Then Günther described Catopra malabarica in 1865. However, it was found to be a junior synonym of Pristolepis marginata by Jerdon himself the next year (see the present paper by Britz et al., for an interesting read on all these). So there was only one recognized Pristolepis from South India.

However, some authors cited both P. marginata and P. malabarica to be present in India, some others said P. marginata was the only species in India, some authors also said that P. marginata and P. fasciata were present in India. It is noteworthy that P. fasciata was described from South East Asia and its type locality is in Borneo and it has stripes on its body. No Pristolepis in India has stripes (at least until now). Another funny fact is that Indian authors have “sequenced” P. fasciata from India, when this species is absent from India, just see NCBI genbank.

So this study puts to rest a lot of confusion about Pristolepis in India. It highlights the importance of proper taxonomy before phylogeny or sequencing studies. It also takes back the readers to the real science where observation is made hypothesis is formulated and it is proved right or wrong, the writing style illustrates the process (thought process) behind the find, which should be educating for young researchers. Finally we have a new species of fish that was unknown until yesterday.


