Skip to content

ORCID mania

ORCID is an increasingly popular service to disambiguate authors of scientific publications. Many journals and funding bodies require authors to register their ORCID ID these days. Wikidata has a property for ORCID, however, only ~2400 items have an ORCID property at the moment of writing this blog post. That is not a lot, considering Wikidata contains 728,112 scientific articles.

Part of the problem is that it is not easy to get ORCIDs and its connections to publications in an automated fashion. It appears that several databases, public or partially public, contain parts of the puzzle that is required for determining the ORCID for a given Wikidata author.

So I had a quick look, and found that, on the ORCID web site, one can search for a publication DOI, and retrieve the list of authors in the ORCID system that “claim” that DOI. That author list contains variations on author names (“John”, “Doe”, “John Doe”, “John X. Doe” etc.) and their ORCID IDs. Likewise, I can query Wikidata for a DOI, and get an item about that publication; that item contains statements with authors that have an item (“P50”). Each of these authors has a name.

Now, we have two lists of authors (one from ORCID, one from Wikidata), both reasonably short (say, twenty entries each), that should overlap to some degree, and they are both lists of authors for the same publication. They can now be joined via name variations, excluding multiple hits (there may be two “John Doe”s in the author list of a publication; this happens a lot with Asian names), as well as excluding authors that already have an ORCID ID on Wikidata.

I have written a bot that will take random DOIs from Wikidata, query them in ORCID, and compare the author list. In a first run, 5.000 random DOIs yielded 123 new ORCID connections; manual sampling of the matches looked quite good, so I am adding them via QuickStatements (sample of edits).

Unless this meets with “social resistance”, I can have the bot perform these edits regularly, which would keep Wikidata up-to-date with ORCIDs.

Additionally, there is a “author name string” property, which stores just the author name for now, for authors that do not have an item yet. If the ORCID list matches one of these names, an item could automatically be created for that author, including ORDIC ID, and association to the publication item. Please let me know if this would be desirable.

13 Comments