Skip to content

The Depicts

So Structured Data on Commons (SDC) has been going for a while. Time to reap some benefits!

Besides free-text image descriptions, the first, and likely most used, element one can add to a picture via SDC is “depicts”. This can be one or several Wikidata items which are visible (prominently or as background) on the image. Many people have done so, manually or via JavaScript- or Toolforge-based mass editing tools.

This is all well and good, but what to do with that data? It can be searched for, if you know the magic incantation for the search engine, but that’s pretty much it for now. A SPARQL query engine would be insanely useful for more complex queries, especially if it would work seamlessly with the Wikidata one, but no usable, up-to-date one is in sight so far.

Inspired by a tweet by Hay, and with some help from Maarten Dammers, I found a way to use SDC “depicts” information in my File Candidates tool. It suggests files that might be useful to add to specific Wikidata items.

Now, since proper SDC support is … let’s say incomplete at the moment, I had to go a bit off beaten path. First, I use the “random” sort in the Commons API search for files with a “depicts” statement. That way, I get 50 such files with one query. Then, I use the wikibase API on Commons to get the structured data for these files. The structured data contains the information which Wikidata item(s) each file depicts.

Armed with these Wikidata item IDs, I use the database replicas on Toolforge to retrieve the subset of items that (a) have no image (P18), (b) have P31 “instance of”, (c) have no P279 “subclass of”, and (d) do not link to any of a number of “unsuitable” items (eg. templates or given names). For that subset, I get the files the items use, eg as a logo image (to not suggest their usage with the item), and then I add an entry to the database that says “this item might use this image”, according to the depicts statements in the respective image (Code is here, in case you are interested).

50 files (a restriction imposed by the Commons API) are not much, especially since many images with depicts statements probably are used as an image on the respective Wikidata item. So I do keep running such random requests in the background and collect them for the File Candidates tool. At the time of writing, over 12k such candidates exist.

Happy image matching, and don’t forget to check out the other candidate image groups in the tool (including potentially useful free images from Flickr!).

One Comment