Quantifying error in occurrence data: Comparing the data quality of iNaturalist and digitized herbarium specimen data in flowering plant families of the southeastern United States

PLoS One. 2023 Dec 7;18(12):e0295298. doi: 10.1371/journal.pone.0295298. eCollection 2023.

Abstract

iNaturalist has the potential to be an extremely rich source of organismal occurrence data. Launched in 2008, it now contains over 150 million uploaded observations as of May 2023. Based on the findings of a limited number of past studies assessing the taxonomic accuracy of participatory science-driven sources of occurrence data such as iNaturalist, there has been concern that some portion of these records might be misidentified in certain taxonomic groups. In this case study, we compare Research Grade iNaturalist observations with digitized herbarium specimens, both of which are currently available for combined download from large data aggregators and are therefore the primary sources of occurrence data for large-scale biodiversity/biogeography studies. Our comparisons were confined regionally to the southeastern United States (Florida, Georgia, North Carolina, South Carolina, Texas, Tennessee, Kentucky, and Virginia). Occurrence records from ten plant families (Gentianaceae, Ericaceae, Melanthiaceae, Ulmaceae, Fabaceae, Asteraceae, Fagaceae, Cyperaceae, Juglandaceae, Apocynaceae) were downloaded and scored on taxonomic accuracy. We found a comparable and relatively low rate of misidentification among both digitized herbarium specimens and Research Grade iNaturalist observations within the study area. This finding illustrates the utility and high quality of iNaturalist data for future research in the region, but also points to key differences between data types, giving each a respective advantage, depending on applications of the data.

MeSH terms

  • Data Accuracy
  • Magnoliopsida*
  • North Carolina
  • South Carolina
  • Virginia

Grants and funding

These studies were supported by a National Science Foundation grant (awarded to Douglas Soltis (DS) and Pamela Soltis (PS)) CIBR: Collaborative Research: Integrating data communities with BiotaPhy: a computational platform for data-intensive biodiversity research and training (Award #1930007). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.