Skip to content

Red link lists on steroids

I have long been a fan of red link lists, collections of topics that ought to have a Wikipedia article. Often, these are complied from other sources, such as the public domain 1911 Encyclopaedia Britannica, or the Dictionary of National Biography. Such lists often give a good overview over an area of knowledge such as general encyclopedic topics or biographies, and, in the early days of Wikipedia, were also a crude marker of “how good are we”, compared to established works. However, having worked on both compiling and working down such lists, I am also aware of the temptation they present to the OCD-inclined; seeing the list shrink, and the percentage of checked-off topics grow is good, as it presents a clear goal, in contrast to the “more articles!” drive often encountered.

That said, the practicalities of link lists on Wikipedia can be quite painful. Tens of thousands of links need to be chopped into many subpages, and these have to be divided into sections for easy editing. In the beginning, we tended to remove “done” links, but later we switched to tagging each entry with a {{done}} template. Now, just because a link has switched to “blue”, doesn’t mean it actually points to the topic stated in the list; it may be a different person or concept, or a disambiguation page. It also becomes tedious to check between rendered links and wikitext, even when editing sections.

Another point that always bothered me is that these lists are, by their very implementation, limited to a single language Wikipedia. There may be a perfectly good German article about a “redlink” on the English site. These days, the latter issues could, in principle, be solved quite elegantly by Wikidata. Also, linking Wikidata items to external resources via ID properties is a big step ahead. But Wikidata, in its current state, does not really lend itself to the basic problem, which is listing items we should have articles about, but currently do not. Technically, this could be solved by creating “empty” items that only have an external ID property; however, I feel this would be frowned upon. Furthermore, for this to work in an automated fashion, it needs to be clear that, at the time of item creation, there is no existing item that should be used instead. “Fuzzy” matching would lead to tens of thousands of empty, duplicated items. All this assumes that there is actually a property of the external resource in question.

The catalog list.

The catalog list.

So, finally, I took a request by the master of the DNB articles as an excuse to write Yet Another Tool. Called mix’n’match, it can manage entries in “catalogs”, that is, third-party resources, and their relation to Wikidata items. Initially, I have imported entries form ODNB, BBC’s Your Paintings, Appletons’, and the 1913 Catholic Encyclopedia; some of these we already have as redlink lists on English Wikipedia. I’ll be happy to add more catalogs on request. The entries, slightly over 100,000 at the time of writing, have individual links back to the respective resource, titles, and (partial) descriptions.

Individual entry editing.

Individual entry editing.

Entries can be not matched to a Wikidata item (the default state), matched to an item manually by a user (requiring TUSC to log in, as a vandalism precaution), automatically matched by some fuzzy name matching, or “not applicable” (N/A), if an entry is in the resource but should not have a Wikidata item, ever.

Entry matches can be individually changed, by

  • specifying a Wikidata “Q number”, e.g. after searching Wikidata or Wikipedia
  • confirming an automatically suggested match
  • removing a match that was set automatically or by a user
  • flagging an entry as not applicable
Recent Changes.

Recent Changes.

Lists of entries and their status can be generated in chunks, and filtered by their status; for example, you can show only automatic suggestions for easy, one-click matching (name and description of the suggested Wikidata item are shown as well, where available), or only unmatched items to get the cases that are a little bit harder. All changes are tracked with edit time and user; there is even a Recent Changes page, a well as a rudimentary search function.

I hope this tool will reach critical mass of fellow obsessive list-checkers! And while you’re at it, feel free to add some statements to the odd Wikidata item that shows up as a candidate…

One Comment

  1. Asaf Bartov wrote:

    This is wonderful! I’m a big redlink list fan, and I’ve been pondering a tool along very similar lines, as part of my broader scheming in The Aboutness Project — http://aboutness.org

    So thanks for providing this tool! I will both start using it, and soon offer a new dataset for it to ingest.

    Wednesday, November 13, 2013 at 07:01 | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*