Skip to content

A quick description

So there is a lively discussion about using descriptions from Wikidata in places like Wikipedia search results, especially on mobile. While everyone seems to agree that this is a good idea, camps are forming with supporters of manual and automatically generated descriptions, respectively. Time for an entirely manual description of my POV.

At the time of writing this, there are about 14 million items on Wikidata. The Wikiverse deals with about 250 languages. That comes to ~3.5 billion possible descriptions of items, a number that will only increase with time. Right now, less than 4% of these descriptions are filled in, many of them generated by bots (e.g. “Wikimedia disambiguation page”, not all of them correctly). And do not kid yourselves, those will stay. They will continue to say “American actor”. Even after we add statements about his/her nationality, gender, birth and death dates, spouses, parents, children, important awards, etc., the description will still say “American actor”. There are, by far, not enough volunteers to fill in >3 billion descriptions, especially on the ~240 or so non-“main” languages; most have little enough labels for the items. Except for maybe English, there are no people to go around and improve existing descriptions, probably multiple times for the same item. For most people in the world, Wikidata manual item descriptions are a wasteland, and it’s here to stay.

But there is an alternative. A bot can look at an item, see it’s about a person with nationality “U.S.”, and occupation “actor”. It can, from that, write “American actor”. It can, in fact, do much better than that, given the right statements. It will improve its description as more information becomes available. And it can do so in all 250 languages, given a little volunteer effort for each of them. It won’t win a literature contest any time soon, but it will get the basic message across, in most cases.

 

As a hands-on person, I wrote a little tool a while ago, which attempts to do just that. Limited by time and my 2-out-of-250 language abilities, it is far from perfect, or even working properly for may languages. But let me give an example.

There is an article about a specific model of “flying boat” in several languages, the Dornier Do J. On Wikidata, there is a (as in: one) manual description, in Italian, for the respective item, which reads “idrovolante Dornier-Werke”. I don’t speak Italian, but it looks … truncated? (Google translate agrees with this assessment.)

So I ran my automatic description on this, for a few languages:

English: Dornier Do J : Flying boat by Dornier
German: Dornier Wal : Flugboot von Dornier-Werke
French: Dornier Do J : Hydravion à coque par Dornier
Spanish: Dornier Do J : Hidrocanoa por Dornier Flugzeugwerke
Japanese: Do J : 飛行艇 by ドルニエ
Vietnamese: Dornier Do J : Tàu bay bởi Dornier Flugzeugwerke
Telugu: Q1245981 : ఎగిరే పడవ Dornier Flugzeugwerke చేత తయారు చేయబడినది

Perfect? Certainly not. Wrong? In some cases; “by” is not really a Japanese word, as far as I know. But I would think that most Japanese readers would know what the item is about, from that description.

Screen Shot 2015-08-19 at 16.08.44Note that there are no Wikipedia articles about this topic in Vietnamese, nor Telugu. These texts (as good or bad as they may be) could show up in a Telugu Wikidata search. Or a Wikipedia one, even if no te.wikipedia results were found. The code exists, and is used (e.g. on Italian Wikipedia) already.

You can see the automatic descriptions for the Wikipedia page you are on yourself. Simply add

mw.loader.load("//en.wikipedia.org/w/index.php?title=User:Magnus Manske/autodesc.js&action=raw&ctype=text/javascript");

to your common.js User subpage, or to your global JavaScript page, which will activate it on all Wikipedias you work on. I found this a great way to see where the Wikidata item is lacking, and needs some more statements, or where items need a label in your language.

One Comment

  1. Ricordisamoa wrote:

    While “idrovolante *della* Dornier-Werke” may sound more natural in Italian, I think it’s correct as it is, just as “Audi car” in English.
    And by the way it is not “manual”, see https://www.wikidata.org/wiki/Special:Contributions/ValterVBot?offset=20130209170124

    Thursday, August 20, 2015 at 08:54 | Permalink