Skip to content

The Games Continue

Two weeks after releasing the first version of The Wikidata Game, I feel a quick look at the progress is in order.

First, thank you everyone for trying, playing, and feedback! The response has been overwhelming; sometimes quite literally so, thus I ask your forgiveness if I can’t quite keep up with the many suggestions coming coming in through the issue tracker, email, twitter, and various Wikidata talk pages. Also, my apologies to the Wikidata admins, especially those patrolling RfD, for the flood of deletion requests that the merge sub-game can cause at times.

Now, some numbers. There are now six sub-games you can play, and play you do. At the time of writing, 643 players have made an astonishing 352,710 decisions through the game, many of which result in improving Wikidata directly, or at least keep other players from having to make the same decision over again.

Let’s look at a single game as an example. The merge game has a total of ~200K candidate item pairs, selected by identical labels; one of these pairs, selected at random, is presented to the user to decide if the items describe the same “object” (and should thus be merged), or if they just happen to have the same name, and should not be shown in the game again. ~20% of item pairs have such a decision so far, which comes to ~3.000 item pairs per day in this game alone. At that speed, all candidates could be checked in two months time; realistically, a “core” of pairs having only articles in smaller languages is likely to linger much longer.

Of the item pairs with decisions, ~30% were judged to be identical (and thus merged), while ~31% were found to be different. But wait, what about the other 39%? Well, there are automatic cleanup operations going on, while you play! ~26% of item pairs, when loaded for presentation to the used, were found to contain at least one deleted item; most likely, someone merged them “by hand” (these probably include a few thousand species items that I accidentally created earlier…). ~6% contained at least one item that was marked as a disambiguation page since the candidate list was created. And ~9% were automatically discarded because one of the items had a link to the other, which implies a relation and, therefore, that the items are not identical.

As with the other sub-games, new candidates are automatically added every day. At the same time, users and automated filters resolve the candidate status. So far, resolving happens much quicker than addition of new candidates, which means there is light at the end of the tunnel.

Gender property assignments over time.

Gender property assignments over time.

Merging is a complex and slow decision. Some “quicker” games look even better in terms of numbers: The “gender” game, assigning a male or female tag to person items, has completed 42% of its ~390K candidates, a rate of almost 12K per day. The “sex ratio” is ~80% male to ~18% female (plus ~2% items already tagged or deleted on Wikidata). This is slightly “better” for women than the Wikidata average (85% vs. 15%), maybe because it does “solve” rare and ambiguous names as well, which are usually not tagged by bots, or because it has no selection bias when presenting candidates.

The disambiguation game is already running out of candidates (at 82% of ~23K candidates). Even the “birth/death date” game, barely a day old, has already over 10K decisions made (with over 84% resulting in the addition of one or two dates to Wikidata).

In closing, I want to thank everyone involved again, and encourage you to keep playing, or help this effort in other ways; by helping out on Wikidata RfD, by fixing potentially problematic items on flagged items, by submitting code patches, or even by becoming a co-maintainer for The Game.