Skip to content

The Game of Source

Wikidata has beautiful mechanisms to associate individual claims with sources for that claim. However, finding and adding such sources is surprisingly complex, and, between multiple open tabs and the somewhat sluggish interface, can strain the patience of the most well-meaning editor.

I had previously attempted to simplify adding sources to Wikidata statements; and while I believe this interface to be much easier to use than Wikidata proper, it is still clunky, and has issues on mobile.

Screen Shot 2015-06-01 at 22.18.35So, I went ahead and reduced the issue to its most basic form: Does a short text snippet support a specific claim? To achieve such a simplified interface, the following must happen:

  • A Wikidata item is picked (by random)
  • The associated Wikipedia articles are investigated
  • The external links of these articles are merged
  • The HTML for these URLs is retrieved, and HTML tags are stripped, leaving only the plain text
  • Claims from the item are prepared. This includes getting the label of “item statements” (Pxx => Qyy), and formatting the dates of “time statements” in various ways (2015-06-01, “June 1, 2015”, etc.)
  • The claim values (labels and dates, for now) are searched for in the HTML of the external URLs above
  • The hits, including some flanking text, are stored in a database

This process is repeated over and again. Finally, an interface presents the hits for a specific claim to the user. A single click can now add that URL as a source to the claim on Wikidata (via WiDaR), together with the original retrieval date (example). The entire set can be marked as “Done” (as in, don’t show this again to anyone), or skipped (claim goes back into the “pool”).

It is early days for this interface now. No doubt, many improvements are possible, and even though claims are added to the database in the background, there are only ~1,000 claims in there at the time of writing this. Patience.