Skip to content

How to nip a budding project

Wikidata is booming. A huge number of editors, many of them new contributors to the Wonderful World of Wikimedia projects, are adding, fixing, and sourcing claims to the database. A ridiculously high amount of language links have been moved from Wikipedia to Wikidata. Part of the promise of this new project is to unify factoids across the different language editions of Wikipedia; another is to make the key facts of Wikipedia (and other sources) accessible in a machine-readable form for third-party users. For those users, which will probably outgrow the use of claims on Wikipedia in the long run, the amount of (reasonably reliable) data is paramount. Wikipedia really took off once it reached a critical mass of useful knowledge; the same is due to happen to Wikidata.

Or is it? When it comes to claims and statements, a huge amount of heavy lifting has been, and is being, done by bots. The idea is simple; there is a lot of simple, factual data that can be extracted from a Wikipedia article and inserted into the corresponding Wikidata item, using categories, templates, or specific matches in the article text itself. For proper provenance, bots usually add “imported from en.wikipedia” as a “source” to the imported claim.

But this well-working system is about to end, sabotaged by people obsessing with “proper sources”. Now, don’t get me wrong; I do think all statements in Wikidata should be sourced properly! It is a tremendous opportunity to link the world’s knowledge up in the most fine-grained form imaginable. It is, undoubtedly, the right thing to do. But the drive to enforce this now, for all bots, is premature. Mainly because it is impossible.

I know. I created a list of birth and death dates from Wikipedia articles. Over 300.000 Wikidata items about people could have that information, in a matter of hours. But this has been verboten by the bureaucracy, as the dates do not have proper sources. Non-technical people may ask, “so why not just add sources”? I’d love to; except, it might be somewhat hard to have a bot scan the sources in the Wikipedia article, go to a library, get all the books, read them, find the mentioned dates, return the books, find or create the book items in Wikidata, and link them up accordingly. A bot couldn’t source web resources either (even if it would somehow know which one to use), since there is currently no URI type on Wikidata.

So all these dates will have to be entered and sourced manually, by a human. And this is why it’s the wrong approach; it doesn’t scale, even in the Wikimedia world, which is not exactly lacking volunteers. It could be done eventually, but the flame of hope that is Wikidata now will have faded by then. Facts on Wikipedia have been checked to a high degree, as was demonstrated time and again. The important thing right now is to get these facts into Wikidata, to keep the momentum, to get others interested in using, and working with, Wikidata as a machine-readable knowledge repository. Sources will, and should, be added. In due time.

2 Comments


  1. Fatal error: Uncaught Error: Call to undefined function ereg() in /home/www/wordpress/wp-content/themes/veryplaintxt/functions.php:183 Stack trace: #0 /home/www/wordpress/wp-content/themes/veryplaintxt/comments.php(33): veryplaintxt_commenter_link() #1 /home/www/wordpress/wp-includes/comment-template.php(1510): require('/home/www/wordp...') #2 /home/www/wordpress/wp-content/themes/veryplaintxt/single.php(41): comments_template() #3 /home/www/wordpress/wp-includes/template-loader.php(77): include('/home/www/wordp...') #4 /home/www/wordpress/wp-blog-header.php(19): require_once('/home/www/wordp...') #5 /home/www/wordpress/index.php(17): require('/home/www/wordp...') #6 {main} thrown in /home/www/wordpress/wp-content/themes/veryplaintxt/functions.php on line 183