A brave but misguided attempt to solve a real problem

This is a long blog post. Sorry. Download it as a PDF here.

This is a tale of four taxonomic papers. Wait, hear me out! It’s not as boring as that sounds. This is not (just) a sordid taxonomists’ squabble, this concerns you! For anyone interested in biodiversity, there are quite a few interesting ideas along the way.

First the very short version. Some taxonomists recently suggested that new species should be described quickly, based only on DNA barcodes and a fuzzy photo. Others feel that this doesn’t work: there’s too little information on the new species, and it slows things down for everyone else. Frank and forthright discussion continues. Well, forthright anyway, very much so.

What’s the difference?

When a new species is discovered, someone sits down and describes it. Normally this means they take a photo, tell what it looks like, tell how to recognise it (i.e. tell it apart from existing species), give it a name. Here’s an example.

Some people felt this this takes too long. They suggested taking one single character, the DNA barcode, and giving only that and a photo. Here’s what it looks like.

Not quite as informative, is it? The diagnosis just says what DNA sequence to look for. If base pairs 72-75, 163 etc are G, it’s this species.

The four papers

Before we get into what this means, and why you should be interested, let’s see the four papers I mentioned. These will introduce this new way of describing species, heavily criticise it, then ignore the criticism and do it again.

Meierotto (2018)

A few years ago, Sarah Meierotto published twenty new wasp species from Costa Rica in her master’s thesis. She described them in two different ways: both the normal way (with descriptions, diagnoses etc), and a fast way using only DNA barcodes.

This is it. This is the proposed new way of describing species. But because we’ve got into the bad habit of thinking theses are not “real”, it also had to be published as a paper.

Meierotto et al. (2019)

The “et al.” are Michael Sharkey (Sarah’s supervisor) and several experienced scientists in DNA barcoding and/or wasps. This is common practice; let the student do the thesis, then have experienced hands improve it for the paper. Although it has to be said, little improvement seems to have been needed! The thesis was already so finished, they seem to have basically published chapter 3 as this paper, and chapter 4 as another paper, with few major changes.

There are 18 new species. They can be told apart from each other based on the DNA barcode. That, and a single, mediocre photo per species, are all the information we’re given. Apparently they can also be told apart from the fifty or so already described species, but we are not told how.

The rest of the paper is mostly devoted to arguing on behalf of this new “revolutionary” way of describing species. We are told that most species have still not been discovered. That at the rate we are going, they never will be. That this method, although it has its minor faults, is fast. And that more complete descriptions can be written later.

If you’re wondering where two species disappeared (there were twenty in the thesis), no DNA could be got from them. So they’ve been left out.

Taxonomists reading this may ask “Didn’t they already describe these species?“. The answer seems to be.. yes. They appear to have accidentally described the same species twice. At least I can’t find any mention that the thesis’ descriptions would not be valid. But I may be missing something?

Zamani et al. (2020)

This is the point when I started to hear concerned talk in the museum’s coffee room. We’re very much into discovering the Earth’s hidden biodiversity. Or to put it otherwise, we’re the target audience for this method. New species? That’s pretty much all of us, Alireza Zamani very much so. DNA barcoding? That’s Varpu Vahtera. Tropical wasps? That’s Ilari Sääksjärvi (and me too, for that matter). Feedback came in writing, and it was not positive.

First, the obvious mistake. The new species can apparently be told apart from the fifty or so already described species. But we are not told how. Not one word on what differences there were between the new and old species. How do we even know that these species really are new?

As for the method itself, there’s nothing new about using DNA barcodes. What is new is only using barcodes and practically nothing else. What do you do if you can’t get barcodes, or can’t afford them? How can you then tell if your species has already been described? Going to happen a lot, especially to aspiring young taxonomists in the tropical countries where most undiscovered species reside. How about all the described species that have not been barcoded?

Oh, and if you are so tight for time, why did you waste time on mediocre photographs which don’t even show the whole wasp? Is describing species even the bottleneck, we spend most of our time collecting the wasps, and writing descriptions barely takes any time at all? And others manage to give barcodes and other distinguishing characters without it taking a particularly long time.

There’s more, but the overall message is clear. This method leaves out so much information on the species, it will slow things down instead of speeding them up. DNA barcodes are very useful but they are not a magic bullet which solves all problems.

Sharkey et al. (2021)

There are two ways you can react to strong criticism like the above (three, if you count ignoring it). You can learn from it: see to what extent it is justified and improve what you’re doing. Or you can bat it away: state that you are right, the critics wrong.. and describe another 400 species with the same method.

It has to be said: this paper is sloppy. There’s no kinder way to express it. And that’s me writing this, I who have a bad habit of thinking the best of people. Less haste, more speed might have allowed the authors to at least read the criticism correctly. Which of course doesn’t mean that their method would be wrong; that’s a separate question. More about the sloppiness later, there might be understandable reasons for it.

The method is the same, except the photos are worse and the barcodes a bit better. We’re given the whole barcode (not just those base pairs that differ).

We’re told that the mistake in Meierotto et al. (2019) was not a mistake. The critics obviously can’t be asking how the new species differ from the old, they must be asking for evidence that someone looked at the old species. We’re not going to prove we visited the museums involved, thank you very much.

And then a long and rather unstructured response. Who cares about old museum specimens? When they’ve been barcoded they may “regain their value”. Yes, simple COI barcodes can sometimes be misleading, but this shouldn’t be more of a problem than for other characters. Yes, not everyone can barcode, but this is not a problem because we’ve done the barcoding for them, and anyways a minimum of $1000 is not that much. We can’t give proper photographs because there’s no time. We can’t describe the species (beyond barcodes) because there’s no time – and anyway, they’ll just need redescribing when more species are discovered.

There’s quite a bit more, but it boils down to: there’s not enough time or money, this is the best we can do and at least it’s a start. This is the future, like it or not.

Why should I care?

So some taxonomists are squabbling, what’s new? The thing is, this is not just a discussion about the best way to write taxonomic papers, i.e. papers which no one ever reads. Decisions are being made that affect how we (humanity) find out about the biodiversity of our planet.

The problem that’s being discussed, the taxonomic impediment, is a real one. The millions of species that inhabit the Earth are quickly disappearing, obliterated by the excess growth of just one of them, the human. And we can’t do as much as we’d like to about it, because we don’t even know that most of the species exist. How can you decide which forest area to cut down and which to save, if you know nothing about the creatures who live there?

And we know nothing. We guess that there are eight million animal species on Earth (or many million fewer, or tens of millions more). We’ve discovered about a million of them. This is the the taxonomic impediment: we are discovering who we share our planet with so slowly, that the species go extinct faster than we describe them.

It’s a problem that desperately needs solving. We’re like a hospital that’s so overwhelmed, doctor’s are making decisions without having time to more than glance at the patient. And 85% of the patients aren’t even in the hospital records.

What Sarah Meierotto, Michael Sharkey and the others are proposing is one potential way of solving this problem, or at least making it milder. Save time by dropping everything from species descriptions except the DNA barcode. Use the time that’s been saved to describe more species. What Alireza and the others are concerned about is that this won’t save time: cutting corners like this will just create species “descriptions” that are useless, and make more work for everyone in the future.

I’ll get back to this soon. But first, there’s something that needs to be got out of the way.

That fourth paper

I’ve no problem with 3/4 of the papers. I really like the thesis: many theses are just for practice, this is an independent scientific work. The paper based on it is a perfectly valid presentation of a new idea. Except for that silly beginner’s mistake, of not telling how the new species differ from the old. The criticisms in the third paper are perfectly justified and deserve proper consideration.

But then there’s Sharkey et al. (2021). Some have even suggested that ZooKeys should have refused to publish it. And I can see why. It’s sloppy. And it’s arrogant.

The sloppiness, oh the sloppiness. If someone points out your beginner’s mistake, of not telling how the new species differ from the old, how can you misunderstand and think they want proof you saw the type specimens? It’s either intellectual dishonesty, or you barely glanced at the criticism.

Then there’s taking the end of a sentence, stating “The authors did not expand on this criticism”, and ignoring what the critics had to say. Let’s look at more than just the end of that sentence, shall we:

“[Purely DNA‐based descriptions] will impair this science in developing countries which house most of the undiscovered portion of biodiversity, due to high costs and lack of staff and technology. Considering the status of taxonomy as a fundamental science, this would drastically affect other related fields of study and, importantly, conservation.”

What this criticism means is that in those parts of the world where most of the undiscovered biodiversity resides, this approach will prevent aspiring young taxonomists from describing species (or even getting into taxonomy). Which will lead to species not being described. So we won’t be able to do anything else, such as justify conservation areas. Glad to clear that up for you. Something that should have been totally obvious if you’d properly read the paper.

As for the arrogance, let’s take the low-hanging fruit first. Naming a species after yourself? Not the done thing. Not even if one of the authors does it for another of the authors.

The rest of it is rather low-hanging fruit too. There are statements like:

Everyone should stop describing species without barcodes. (When it is pointed out that these “descriptions” are incompatible with the many species descriptions which lack barcodes)
Old museum specimens have little value. (When it is pointed out that these descriptions can’t be compared to millions of historical specimens)
$1000 or more isn’t that much, and anyone who can’t afford it can write descriptions for the species we named instead. (At my study site in Uganda, that’s close to a year’s salary..)
This is a “democratization of taxonomy” which is probably why some conservatives oppose it. (Yes, now any American professor in Costa Rica can take wasps that others have collected, navigate the legal obstacles to DNA sampling, and pay for postage to Canada and sequencing..)
Identification keys might perhaps have a point some day for, I don’t know, wasp-watching tourists or something, but by then everyone will have a pocket barcode scanner anyway.
“If there is any hope of overcoming the taxonomic impediment it will be achieved with the approach outlined here or something similar to it.”

To be fair, much of the arrogance probably stems from being on the defensive. They presented a new idea which was not liked. Not one little bit. It seems that the criticism was too much to handle.

And to be even more fair, I noticed that they submitted the paper a few months before the criticism was published. Did they end up having to tack on a hasty response after submission?

It won’t come as a surprise that I am rather.. unimpressed by this paper. But what about the idea it promotes? Should we reject it just because the way its presented raises warning flags? I’d say, no. People can reach the right result for the wrong reasons. Unlikely, but possible. Let’s look beyond the faults to what the authors are trying (and failing) to clearly say.

Is this a good idea?

Is this the way to describe our planet’s species? To give the information we need to solve the biodiversity crisis and stop the sixth mass extinction?

No.

I’m not usually this forthright, but.. no.

Although there are mitigating factors. I’ll get back to them.

Zamani et al. (2020) have really given the basic reasons why not. To recap: only giving a DNA barcode for each species may save you time, but will make it hard if not impossible for others to find out if their species is new or has already been described. It’ll instantly rule out the majority of humanity from describing species, and probably the majority of specimens and species descriptions from being used. And it is doubtful how much time this saves, since the real bottleneck is collecting and processing the specimens.

Remember that overwhelmed hospital analogy? This is like solving the problem by giving every patient some painkiller and sending them home. Clears the queue, certainly. But doesn’t actually help. And soon the patients will be back and your hospital (and perhaps other hospitals) will be even more overwhelmed than before.

Let’s look at this from my viewpoint. I study tropical wasps. One of the planet’s most diverse taxa in one of the most diverse and poorest known environments. I’m the epitome of the target audience for this method. And I don’t get it.

I don’t get why this is needed. It took me a year to collect wasps in Uganda. It took me three years to get the first subfamily (Rhyssinae) processed so it could be studied. Three years of sorting, giving each rhyssine its own label and databasing it. It took me a few weeks to figure out how to tell the species apart and write the first draft of the species descriptions. Why cut everything worthwhile, everything that makes a description interesting instead of just being a catalogue code, to save a week or two?

I don’t get how this would work in the long run. I compared my rhyssine wasps to those species which had already been described from Africa. What would have happened if their descriptions had consisted solely of a DNA sequence and a poor photo? I’d never have described my species, that’s what. It took me years to find out if I’m even allowed to do a DNA test by Ugandan law (long story, don’t get me started on this). I would have needed something like $3000-$4000 to sequence all 450 or so wasps. That’s more than I got in research funding the whole year that I started describing the species (admitted, there was a parental leave halfway). Only sequence some of them? But then I need to sort them into preliminary species to find out which to sequence, which rather defeats the whole time-saving bit. And how can anyone in Uganda use the results?

I especially don’t get why anyone would want to do this. However much I may have focused on the biodiversity crisis earlier on, the main reason, the sufficient reason, to study these wasps is because it’s interesting. In its own right. No one looks at the LBBs (little brown birds) of the insect world unless they are fascinated by them and want to learn more about them, and this method contributes nothing to that. To solve the biodiversity crisis? But fancy me going to the Uganda Wildlife Authority, stating “Your forest contains AGGCT and TAGATGC, please protect more of it.”. It’s hard enough getting bureaucrats to accept that insects might be worth something; Uganda is focusing on economic development and the focus is (understandably) practical. To find out how many species there are in the forest and get a paper out of it? But you don’t need to name the species for that. Besides, how incredibly dreary.

What I can get is the worry that this would hinder taxonomy, not speed it up. Suppose you’ve just started getting interested in entomology. You’ve caught a couple of hundred wasps and are wondering if any are new species. Only to be told:

“Sorry no can do, because the experts, the people with all the resources and institutional backing, couldn’t be bothered to give you a description. You’ll have to do a DNA test.”

At your own expense. For 200 wasps. Which may all be some common species found everywhere.

What do you do? Say “stuff it” and get into birds instead.

And why am I concerned about eager amateurs? Partly out of principle: I don’t like the idea of taxonomy becoming a preserve of an elite. But also because the eager amateurs either are, or will inspire, the experts of tomorrow. Almost no one decides to become an expert on small tropical wasps. You don’t walk into a career advisor’s room and ask how to become a waspologist; you just end up becoming one because you got interested. It’s something that just happens. But it is not going to happen if you can’t even get started without DNA facilities and funding. This is true for Finland; it is certainly true for the tropical countries where taxonomy is still developing.

No, this makes no sense for me. Nor for most others either, judging by how people have reacted. Yet it must make some sense for Costa Rica, otherwise the people behind this idea would never have bothered to try it out. They’re not beginners, nor incompetents, there must be something about the Costa Rican sampling which made this seem like a worthwhile idea to explore.

I suspect it is how the Costa Rican sampling is carried out. Costa Rica is one of the centres of tropical parasitoid wasp research. I am hazy on the details, but at least in the past much of the collecting has been done by volunteers; Malaise traps in people’s back gardens, for example. Later, parataxonomists (non-experts whose wages are generally paid by NGOs or other outside funding) have done much of the actual hard work of inventorying the country’s biodiversity. Parataxonomists collect the specimens, process them, then send them to the experts to identify. It’s a massive undertaking that costs a lot but has also given a massive amount of data. It is perhaps natural that the expert taxonomists, faced with the unusual situation of more ready-processed specimens than they can handle, have started looking for ways to streamline things. And shifting the workload from the experts to non-taxonomists is a natural continuation of what has been done before.

I must say, though, that this comes perilously close to evading responsibility. (Assuming I’m not completely wrong about the above, which I may be) Large numbers of locals, volunteers and parataxonomists did all the hard work, then the experts with the institutional backing failed to do the final touches.

But let’s be fair..

..this is not merely a bad idea that should be summarily abandoned. I feel it was definitely an idea worth exploring, and there are quite a few good elements to it.

First, there are identification keys. They complain time and again about how cumbersome such keys are, how they always have to be updated whenever new species are discovered, and you know what? They’re right. It is very hard to figure out what characters are going to be useful to tell species apart, when most of the species are still to be discovered.

The solution is not, however, to give up and not bother with keys at all. They’re needed. Someone has to trawl through the different species, figure out on everyone else’s behalf how they can be recognised, and that someone is the expert on the creatures; the taxonomist. Specifically, whoever described the species that need to be identified. You can’t outsource the job to DNA barcodes; for one thing, a DNA barcode is that worst of identification key characters, one which can’t be seen easily at a glance.

The solution will have to be to make the keys relatively easy to update. I tried something like this when I described the African rhyssines: the key is based on a table with all the identifying characters. Find a new species? Simply add its details to the table, and add a character if need be to separate it from other species. Better than this is what Terry Berry and Simon van Noort did: upload interactive keys online, and give the underlying data so others can update the keys or make their own.

Second, they make a convincing case for DNA barcoding. (But not for only DNA barcoding) Yes, there’s nothing new about giving barcodes for the species you describe, but they tend to be an afterthought, not used to full effect. I’m rather starting to feel that new species descriptions should include a barcode by default, unless there is a good reason not to.

What I might do in the future, if I can afford the money and time, is to take a whole subfamily of my Ugandan wasps and barcode the lot of them. Use that to split them into species. Then give the barcodes and some morphological data for each species. Basically what Sharkey et al. (2021) have done, except with other data than just barcodes, and without the pointless attempt to save a few days by not describing the species properly. And just perhaps, if two or three species are so similar in appearance I can’t find anything except DNA to separate them, is it such a crime to say so in the key? All too often, we assume that we must desperately look for some tiny hair of the back of the hind claw to tell the species apart, when what we mean is “if you came here in the key it is one of these two species, but you’ll need a DNA test to tell which”.

A third, tentative idea. Should we separate naming a species from describing it? What they have in effect done is to name their species, but not describe them. Traditionally, the idea has been that any species that is named must also be described, since otherwise the same species will keep on being renamed over and over again. But there is something to the argument that once you’ve used DNA barcodes to split your wasps into species, it would be good to immediately give them a proper name instead of “DNAsp22”. Just a thought, and my instinct says no, but I haven’t thought it through yet.

A verdict

All in all, it is probably a good thing this idea was proposed. We do have millions of species on our planet whom we know nothing about. We do desperately need to know about them. And currently, we are hopelessly slow at doing so. This is a discussion that needs to be held. We won’t do things better if we refuse to try new ideas.

Is it a good idea? No. Was it worth testing? Yes. Should we try out ideas like this in the future? Definitely, as long as we have the humility to abandon (or modify) them if they don’t work.

I’d be horrified if this method came into general use, but there’s a lot of food for thought in it too, which we can learn from if we don’t fixate on fighting either for or against it. If nothing else, we’ve been reminded that things can be done in a different way.. and also that the current way of discovering our planet’s biodiversity has its points.