Thousands of undiscovered mammal species may be hidden in plain sight, new research finds

Taxonomy, the study of how living organisms relate to one another as species, has been around since the 1700s. Though scientists and philosophers have long debated what makes a species a species, taxonomists treat each species as a group of organisms that share common biological characteristics.

Discovering and describing new species is essential to biology researchers and conservationists because they use species as a unit of analysis. Species are also economically important to agriculture, hunting and fishing, and have special legal status, such as under the U.S. Endangered Species Act.

Despite this, scientists have been able to formally name and describe only an estimated 10% of species on the planet, based on discovery trends over the years.

This gap in knowledge is known as the Linnean shortfall. It remains unclear whether poor research methodology, disagreements on how to define a species, or other factors are to blame for this gap.

We are scientists in evolutionary biology, and figuring out ways to better identify species is central to our research. Using genetic analysis and artificial intelligence, we were able to disentangle hidden species that have been lumped together in a single group and predict where and what types they might be. Our findings also pinpoint a potential cause for this shortfall in species identification: an underinvestment in the science of taxonomy.

Determining what makes a species can get complicated.

Hidden species remain to be discovered

For this study, we chose to focus on mammals. Because of their relatively large size and importance to people as a source of food, companionship and entertainment, we predicted that it was more likely that a large proportion of mammalian species have been already been identified.

Our first task was to identify known species that might actually contain two or more species. To do this, we analyzed 1 million gene sequences from 4,300 named species, identifying clusters of sequences that showed high genetic diversity and fitting the data to an evolutionary model.

We found potentially hundreds of hidden species that were previously classified as a single group. This finding was expected, as it mirrors results from previous studies, albeit on a larger scale.

Where and what are these hidden species?

Once we identified the presence of these potentially hidden species, our second task was to determine what specific traits they have in common. To do this, we used a data science technique called random forest analysis, a form of machine learning that draws information from a large number of different variables in order to make a prediction about a particular outcome. It’s similar to the technique that Netflix uses to suggest shows you might be interested in watching.

Random forests is a machine learning algorithm that makes…

Access the original article