Skip to content


Wikipedians love lists. Thus, my list-generating bot is now active on over a dozen wikis, most of then upon request by users, who have set up quite a variety of lists to generate and update.

However, a few issues with this approach have emerged. Some of them are technical; lists get too long for wikitext, some desired functions are hard to implement, and using templates to set up parameters is awkward. Some issues are social; while several wikis have no problems with bot-generated lists in the article namespace, some (OK, one) communities have concerns, ranging from the data quality of Wikidata, over style issues, to the fact that the list can only be edited via Wikidata, and not directly on the respective wiki.

Screen Shot 2015-09-24 at 16.07.01A proposed solution is to implement Wikidata lists as a tool. This solves the “social issues” by moving the lists outside the wikis, while releasing storage and display options from the limitations of MediaWiki. So, for the impatient: Dynamic Wikidata  Lists.

In this tool, everyone can view lists, and change options like language, columns, and sections on-the-fly; to create your own lists, use your trusted WiDaR login. Lists consist of two parts: The items resulting from a Wikidata Query (there could be other data sources down the line), which is stored in the tool, and updated every six hours, or on demand; and the data in the columns, which is loaded on-the-fly, directly from Wikidata. This is a trade-off; no need to store large amounts of data in the tool, and getting the latest data straight from the source, in exchange for a few seconds of waiting time for large lists. Once a list is loaded, the display can be changed with little or no need to load more data. However, even my largest list with over 4,300 items loads the items in ~10sec, and the labels for the column values in another ~5sec, on my machine.

Screen Shot 2015-09-24 at 16.38.43The “>” icon on top of the list opens the display options, and there are quite a few of those. Seven columns types, three sections types, multiple (sub-)section levels, an option to display the top-level section as tabs, arbitrary precision when using dates as sections (e.g. force birth dates into decades), options to override column titles per language, etc.

Links go the the Wikipedia article in the current language, or to Wikidata by default; columns with links to specific wikis are possible. Preferred statements are shown if present, normal ranks otherwise. Columns can be sorted by clicking on the column header (there is no default sort, as the labels to sort on are loaded after the table is created).

A list, once created, can only be changed by the person who created it; but anyone with a WiDaR login can create a new list based on an existing one, and change it in any way desired. List and default language can be specified in the URL, so you can link to a list from Wikipedia in the local language.

There are, undoubtedly, things left to do; the next big one will be to allow Wikidata editing directly from within the tool. I hope this tool will, in addition to the bot, give everyone the power and flexibility to create and manage Wikidata-based lists, help improve Wikidata statements, and maybe even convert the occasional Wikidata nay-sayer :-)

Wikidata lists – Full Circle

So my Wikidata list-generating bot Listeria has become popular in certain circles, creating and updating lists of artworks, species, or ORCID ID holders. With the introduction of the Wikidata SPARQL service, Wikidata queries are becoming more mainstream, and lists are a logical next step.

At the same time, many Wikipedians lack an awareness of Wikidata, and hesitate to go there and edit. Micro-contributions are a way for people to improve Wikidata without much fuzz, but are “hidden” in external tools.

So I added a little bit of code to Listeria. The output now contains a few minor extras, like class names for table cells. These are then used by JavaScript code to allow adding and editing of information in Listeria table cells, right on Wikipedia. Label, description (where unavoidable), item links, dates, coordinates, strings, and images are supported. Simply add

Dialog to add an item link

Dialog to add an item link

to your common.js page on Wikipedia, hover over a Listeria-generated table cell, and you will see add/edit options. Clicking on those will open a dialog to find/enter a value, validated through Wikidata itself. Clicking OK adds this information to Wikidata. Done! (Because of the table being static wikitext, your addition will only show after the next Listeria update, but it is already on Wikidata proper.)

The JavaScript code is adaptable, meaning it could be used to let people edit Wikidata-based infoboxes etc. Of course, it would be much more effective to have this enables for all Wikipedia users, and with more Listeria lists around. But for now, I am content with this being a demo, which may inspire “official” functionality be the WMF, in a few years’ time.

A quick description

So there is a lively discussion about using descriptions from Wikidata in places like Wikipedia search results, especially on mobile. While everyone seems to agree that this is a good idea, camps are forming with supporters of manual and automatically generated descriptions, respectively. Time for an entirely manual description of my POV.

At the time of writing this, there are about 14 million items on Wikidata. The Wikiverse deals with about 250 languages. That comes to ~3.5 billion possible descriptions of items, a number that will only increase with time. Right now, less than 4% of these descriptions are filled in, many of them generated by bots (e.g. “Wikimedia disambiguation page”, not all of them correctly). And do not kid yourselves, those will stay. They will continue to say “American actor”. Even after we add statements about his/her nationality, gender, birth and death dates, spouses, parents, children, important awards, etc., the description will still say “American actor”. There are, by far, not enough volunteers to fill in >3 billion descriptions, especially on the ~240 or so non-“main” languages; most have little enough labels for the items. Except for maybe English, there are no people to go around and improve existing descriptions, probably multiple times for the same item. For most people in the world, Wikidata manual item descriptions are a wasteland, and it’s here to stay.

But there is an alternative. A bot can look at an item, see it’s about a person with nationality “U.S.”, and occupation “actor”. It can, from that, write “American actor”. It can, in fact, do much better than that, given the right statements. It will improve its description as more information becomes available. And it can do so in all 250 languages, given a little volunteer effort for each of them. It won’t win a literature contest any time soon, but it will get the basic message across, in most cases.


As a hands-on person, I wrote a little tool a while ago, which attempts to do just that. Limited by time and my 2-out-of-250 language abilities, it is far from perfect, or even working properly for may languages. But let me give an example.

There is an article about a specific model of “flying boat” in several languages, the Dornier Do J. On Wikidata, there is a (as in: one) manual description, in Italian, for the respective item, which reads “idrovolante Dornier-Werke”. I don’t speak Italian, but it looks … truncated? (Google translate agrees with this assessment.)

So I ran my automatic description on this, for a few languages:

English: Dornier Do J : Flying boat by Dornier
German: Dornier Wal : Flugboot von Dornier-Werke
French: Dornier Do J : Hydravion à coque par Dornier
Spanish: Dornier Do J : Hidrocanoa por Dornier Flugzeugwerke
Japanese: Do J : 飛行艇 by ドルニエ
Vietnamese: Dornier Do J : Tàu bay bởi Dornier Flugzeugwerke
Telugu: Q1245981 : ఎగిరే పడవ Dornier Flugzeugwerke చేత తయారు చేయబడినది

Perfect? Certainly not. Wrong? In some cases; “by” is not really a Japanese word, as far as I know. But I would think that most Japanese readers would know what the item is about, from that description.

Screen Shot 2015-08-19 at 16.08.44Note that there are no Wikipedia articles about this topic in Vietnamese, nor Telugu. These texts (as good or bad as they may be) could show up in a Telugu Wikidata search. Or a Wikipedia one, even if no te.wikipedia results were found. The code exists, and is used (e.g. on Italian Wikipedia) already.

You can see the automatic descriptions for the Wikipedia page you are on yourself. Simply add

mw.loader.load("// Manske/autodesc.js&action=raw&ctype=text/javascript");

to your common.js User subpage, or to your global JavaScript page, which will activate it on all Wikipedias you work on. I found this a great way to see where the Wikidata item is lacking, and needs some more statements, or where items need a label in your language.

A suggestive tool

Do you know what “Stomatitis” is? Neither did I. But there is an article about it on German Wikipedia, and when I found it, it had a blank Wikidata item. Now, I happen to speak German, but I have run into plenty of other blank items with, say, a Russian Wikipedia article, which is not exactly my forte. I could go to Google translate, but oh so inconvenient. And then I’ll have to figure out which properties and items I should link to are. Easy enough for “human” (P31:Q5) to remember, but what was the item for “sex:female” again?

Then I thought: I might not know what the text of the article says, but I bet it is in one or more categories. And these categories have other articles in them, articles similar to the article I can’t read. And many of these articles should have Wikidata items. So I could look at these items, see what statements are common among those, and some of the top ones will probably apply to my blank item as well.

You all know what we need now: More tools! So I wrote some code on Labs, which will give me the “statement ranking” for a single language. I also wrote a JavaScript wrapper around it. It will add a link called “Suggestor” to your toolbar on the left. This is what it looks like:

Screen Shot 2015-08-14 at 22.32.45

It’s definitely a disease, and the “medical specialty” looks right too. If the source describes it, I do not know, but it might be worth finding out.

To add this handy function to your Wikidata experience, simply add

importScript( 'User:Magnus_Manske/suggestor.js' );

to your common.js user subpage. Enjoy!

Add it to the pile!

I have previously blogged about Wikipedia-related page lists, and how they relate to many tools and activities. I also lamented my previous, failed attempts at introducing a “tool pipeline system”.

Well, I am not one to give up easily! The latest, greatest iteration in this vein is PagePile. Essentially, this new tool is managing piles (newspeak for “lists”) of pages from Wikipedia, Wikidata, Commons, and other projects form the WikiVerse.


Filtering a list.

Filtering a list.

New piles can be taken from various sources, including manual lists, WDQ, and the Gather extension. Several of my tools can also generate piles, including AutoList, CatScan, QuickIntersection, and Not-in-the-other-language. Either way, you end up with a numeric PagePile ID.

What can you do with that ID? First of all, you can look at the list (that example leads to the list of all humans on Wikidata, ~2.8M items long), and download it in various formats.

You can filter the list, creating a new list (with a new ID) by following language links, resolving redirects, merging and subsetting with other lists, etc.

Finally, you can import them into several of my tools, including Autolist, FIST, WD-FIST,Not-in-the-other-language, and GetItemNames.

This list will likely grow; it is quite easy to add PagePiles as an input and/or output to a tool. Let me know if there is a tool you would like to see connected to the PagePile ecosystem; likewise for new filters.


If you are a tool author on Labs, you might want to consider linking up to the obvious possibilities of this system. I made a brief introduction for programmers, put the code on BitBucket, and I am working on some code documentation.

Basically, the tool manages a list of sqlite files, each of which represents a pile (=list) of pages on a wiki. You can get the file name of the sqlite3 file from the API or via the PHP class described in the intro. Via that class, or using sqlite3 directly, you can read and write that file, adding and changing lists. Please let me know if you have problems or comments, and if you start using PagePile in your tools, so I can add them to my consumer and/or generator lists.


While I do occasionally write Wikimedia tools “to order”, I wrote quite a few of them because I required (or just enjoyed) the functionality myself. One thing I like to do is adding images to Wikidata, using WD-FIST. Recently, I started to focus on a specific list, people with awards (of any kind). People with awards are, in general, more likely to have an image; also, it can be satisfying to see a “job list” shrink over time. So for this one, I logged some data points:

Screen Shot 2015-06-24 at 11.24.54Over the last 2-3 weeks, even my sporadic use of the tool has reduced the list by 1/4 (note the plateau when Labs was offline!). Some thoughts along the way:

  • The list of item candidates is re-calculated on every page load, and is not stable. As awards are more likely to be added to than removed from items, the total list of people with awards is likely to be longer today than it was at the beginning of this exercise.
  • I cannot take credit for all of this reduction; images that were added to Wikidata independently, but to items on this list by chance, likewise reduce the number of items on the list.
  • Not all of the items I “dealt with” now have an image; many had their candidate images suppressed thanks to a recently implemented function, where all the Wikipedia candidate images for a person are not depicting the person, but either a navbox icon, or something associated with the person (a sculpture made by the person, a house the person lived in, etc.)
  • Many items were “dealt with” by setting a “grave image”. These seem to be surprisingly (to me at least) popular on Wikipedia, especially for people from the former Soviet Union, for some reason.
  • I skipped many items where either the item label or the image name are in non-Latin characters. Oddly enough, I can match images to items quite well if both are in the same (non-Latin) script, by visual comparison 😉
  • I also skipped many items where a candidate item has multiple people. I tried my hand on generating cropped images for specific people with the excellent CropTool, but that remains quite slow compared to the usual WD-FIST actions. Maybe if I can find a way to pre-fill the CropTool values (e.g. “create new image with this name”).
  • Based on a gut feeling, the “low-hanging fruit” will probably run out at ~10-15K items.
  • A sore point for me are statues of people; sometimes, I use close-ups of statues as an image of the person, when no proper image is available. I’m not sure if that is the right thing to do; it often seems to cover the likeness of the person (at least, better than “no image”), but somehow it feels like cheating…
  • There should be a “pictures of people” project somewhere, making prioritized lists of people to get an image for, then systematically “hunt them down” (e.g. ask these people or their heirs for free images, check other free image sources in print and online, group them by “likely event” where they could show up in the future, etc.).
  • I could really use some help for the “Cyrillic people”, towards the end of the list.

Wikidata has passed German Wikipedia in terms of articles/items with an image of the subject, and is now only second to English Wikipedia in that regard. As if in celebration, I added several new features to my Wikidata FIST tool, which makes adding images to Wikidata as easy as a single click (or two, if it’s a plaque, coat of arms, map etc.).

Screen Shot 2015-06-08 at 09.14.49The first feature is to suppress image suggestions, if that image is already used in a Wikidata item. This cuts down on already associated “grave pictures”, as well as “symbol pictures” from infoboxes.

The second is “JPEG only”, if you are looking for actual photographs, and not scans, maps etc.

Third, each item with image candidates now has a little yellow button, which will prevent the images candidates from being shown again for this item. While there are several media properties for items, some things (painting by an artist, buildings by an architect etc.) will never be directly added to the item; instead, each painting/building should have its own item, and link to its creator. So if all image candidates are of that nature, spare yourself and others from having to go through them again next time.

Musing on lists

As some of you may know, I write the occasional tool to help support Wikipedia, Wikidata, Commons, and other projects in the WikiVerse. Most of my tools work on the same basic principle: Get some data to start with, think about it, and present a result. The input data is often a list of pages (or Wikidata items, which is similar), defined by some sort of query.

Now, the number of potential sources for such lists have been multiplying over the years. Off the top of my head, I can think of:

  • Manual lists (paste in a box)
  • Lists on Wikis (numbered, unnumbered, with or without links, with comments, in tables etc.)
  • Category trees
  • Category tree intersections (e.g. QuickIntersection)
  • More complex intersections of category trees, templates, etc. (e.g. CatScan2)
  • Wikidata Queries
  • Complex intersections of categories, WDQ, lists etc. (e.g. AutoList)
  • SQL queries (e.g. Quarry)
  • SPARQL queries (e.g. WDQS)
  • All the tools that use any combination of the above, and generate page lists in return, could be sources again

The problem, however, is more complicated than this:

  • Most tools that process lists could potentially use most of the above sources, or combinations thereof, even if this is not apparent at first glance; a Wikidata tool can still use lists of French Wikipedia articles, as they can be “converted” into corresponding Wikidata items, and vice versa
  • Any of these sources can be combined in several ways; e.g. only pages that are in list A and (list B or list C)
  • These can be combined with non-list properties (last edited less than a month ago, excluding bots; created over 5 years ago; edited by one of these users; use the matching talk/content page; no redirects)
  • This can be done recursively; the same source “types” can be used several times, in a complex query

I have previously tried to allow users to construct a query pipeline, combining the outputs of different tools, and processing (e.g. filtering) them through more tools in new and interesting ways. However, that attempt was not taken up, neither by users nor tool developers.

I tried again to solve the issue, this time by putting the “pipeline” into JavaScript, running right in the users’ web browser. However, usage numbers (except for a single, quite active user) show that again, there was no uptake by users in general.

Maybe I am the only one in the WikiVerse thinking about this? Maybe my attempts are still too clunky for the average user? Maybe there is just no demand, and all the tools run perfectly fine as they are?

There seems to be some general interest in lists; my list generating bot appears to be reasonably popular with users on Wikipedia, albeit not in the article namespace. And an experimental, manual list-generating feature called Gather on mobile Wikipedia seems to be popular. Maybe I am just missing the “killer application” for lists, though the point is that all tools and applications could benefit from list management.

The Game of Source

Wikidata has beautiful mechanisms to associate individual claims with sources for that claim. However, finding and adding such sources is surprisingly complex, and, between multiple open tabs and the somewhat sluggish interface, can strain the patience of the most well-meaning editor.

I had previously attempted to simplify adding sources to Wikidata statements; and while I believe this interface to be much easier to use than Wikidata proper, it is still clunky, and has issues on mobile.

Screen Shot 2015-06-01 at 22.18.35So, I went ahead and reduced the issue to its most basic form: Does a short text snippet support a specific claim? To achieve such a simplified interface, the following must happen:

  • A Wikidata item is picked (by random)
  • The associated Wikipedia articles are investigated
  • The external links of these articles are merged
  • The HTML for these URLs is retrieved, and HTML tags are stripped, leaving only the plain text
  • Claims from the item are prepared. This includes getting the label of “item statements” (Pxx => Qyy), and formatting the dates of “time statements” in various ways (2015-06-01, “June 1, 2015”, etc.)
  • The claim values (labels and dates, for now) are searched for in the HTML of the external URLs above
  • The hits, including some flanking text, are stored in a database

This process is repeated over and again. Finally, an interface presents the hits for a specific claim to the user. A single click can now add that URL as a source to the claim on Wikidata (via WiDaR), together with the original retrieval date (example). The entire set can be marked as “Done” (as in, don’t show this again to anyone), or skipped (claim goes back into the “pool”).

It is early days for this interface now. No doubt, many improvements are possible, and even though claims are added to the database in the background, there are only ~1,000 claims in there at the time of writing this. Patience.


User_Magnus_Manske_listeria_test_-_Wikipedia,_the_free_encyclopedia_-_2015-05-06_13.20.14One of the early promises of Wikidata was the improvement of lists on Wikipedia. These would be automatically generated and displayed, solving a number of problems:

  • Solve inconsistent lists on the same topic across Wikipedias
  • Keep all lists up-to-date
  • Track all possible members of the list via items, instead of per-Wikipedia red links
  • A single edit on Wikidata would propagate to all Wikipedias

Like many other features of Wikidata, this one has been delayed for some time now. With WDQ, and the upcoming SPARQL services, there are now several unofficial query services for Wikidata. It’s time to introduce a service for auto-generating lists now.

Which brings me to the pun of the blog entry title: It’s the German word for “outwitted”, but it could also be read as “super-listed”. Sadly, umlauts can still cause problems with non-German speakers and keyboards, so I run this tool under a biology pun name: Listeria (actually, a genus of bacteria).

How does this work? On Wikipedia (currently, English and German are supported, but it would be easy to add more), one adds a pair of templates to a Wiki page. Once a day (or on manual request), a bot finds those pages, reads the template parameters, and generates a WDQ-based list of items. The list is implemented as a table, to allow for various properties, including images, to accompany the entry. Items are linked to the respective article on the wiki, or to the Wikidata item if no article exists. The list can be auto-sectioned on a Wikidata property (e.g. the administrative unit of an item). Once generated, the bot compares the list with the one already on the page (between the two templates); if different, the bot replaces the list on the page with the new, up-to-date list.

My example page lists Dutch lighthouses, auto-sectioned by administrative unit. I made an English and a German version, using the same template code. They will both be updated at least once a day by the bot; the top template also generates a link to manually trigger the update for a specific page. Starting a new automatic list is as easy as inserting and filling the two templates into a page. So, Wikidata-based lists have arrived, after a fashion.

What’s that, you say? Your manual list contains more entries? Well, go to Wikidata, and create or link up items correctly so they all show on the automated list as well! Oh, your manual table contains more details? Add them to Wikidata! That way, any language edition of Wikipedia can enjoy the list and the information it contains. Also, comparing your list to the automatic one can highlight discrepancies, which may point to faulty information somewhere.

Don’t like lighthouses? How about 15th century composers instead, sectioned by nationality? Or 1980s video games, sectioned by company, ordered by date? Your imagination is the limit!

Now, if we only had numbers with units on Wikidata, so we could store the height of those lighthouses…