This proof-of-concept was produced by David P. Shorthouse using biodiversity occurrence data from Canadensys and academic information from ORCID. Collector and determiner names were parsed and cleaned using the Namae gem among other custom regular expressions. Collector networks were produced with the Ruby Graph Language gem. Scientific authority information in scientific names were extracted using the Biodiversity gem. The MIT-licensed code is available on GitHub.
Collectors and determiners of specimens do not have a measure of impact for their activities in the biological sciences comparable to that of their h-index, a measure of impact for their publications. Here's an attempt at a Collector's Index. It is easy to calculate (or estimate) and explicitly incorporates the effort required to identify specimens, organize collecting trips with colleagues, and deposit curated specimens in museums. The index rewards the identification of one's own specimens to species (the "naturalist" component) and one's network of fellow collectors along with deposition of specimens in multiple museums (the "sociability" component). All measures are obtainable from digitized specimen labels, but is best computed by aggregators of specimen data. The index is relatively immune to vagaries in the quality of collections data, emphasizing instead aspects of collecting and determining that are not tied to taxonomic, georeferencing, and other measures of accuracy. The hope is that the computation of a Collector's Index will add a new and motivating dimension to the digitization of museum specimens. It is calculated as follows:
- CollectorIndex = Rounded to the nearest integer
- t = Number of unique species identified
- n = Number of specimens both collected and identified
- c = Number of colleagues you have collected with (include yourself)
- i = Number of museums where your collected specimens are deposited †
† The 2 ∗ i boost is not included if your collected specimens have not been deposited in a museum.