Skip to content

What else?

Structured Data on Commons is approaching. I have done a bit of work on converting Infoboxes into statements, that is, to generate structured data. But what about using it? What could that look like?

Taxon hierarchy for animals (image page)

Inspired by a recent WMF blog post, I wrote a simple demo on what you might call “auto-categorisation”. You can try it out by adding the line

importScript('User:Magnus Manske/whatelse.js') ;

to your common.js script.

It works for files on Commons that are used in a Wikidata item (so, ~2.8M files at the moment), though that could be expanded (e.g. scanning for templates with Qids, “depicts” in Structured data, etc.). The script then investigates the Wikidata item(s), and tries to find ways to get related Wikidata items with images.

The Night Watch (image page)

That could be simple things as “all items that have the same creator (and an image)”, but I also added a few bespoke ones.

If the item is a taxon (e.g. the picture is of an animal), it finds the “taxon tree” by following the “parent taxon” property. It even follows branches, and constructs the longest path possible, to get as many taxon levels as possible (I stole that code from Reasonator).

A similar thing happens for all P31 (“instance of”) values, where it follows the subclass hierarchy; the London Eye is “instance of:Ferris wheel”, so you get “Ferris wheel”, its super-class “amusement ride” etc.

The same, again, for locations, all the way up to country. If the item has a coordinate, there are also a some location-based “nearby” results.

Finally, some date fields (birthdays, creation dates) are harvested for the years.

The London Eye (image page)

Each of these, if applicable, get their own section in a box floating on the right side of the image. They link to a gallery-type SPARQL query result page, showing all items that match a constraint and have an image. So, if you look at The Night Watch on Commons, the associated Wikidata item has “Creator:Rembrandt”. Therefore, you get a “Creator” section, with a “Rembrandt” link, that opens a page showing all Wikidata items with “Creator:Rembrandt” that have an image.

In a similar fashion, there are links to “all items with inception year 1642”. Items with “movement”baroque”. You get the idea.

Now, this is just a demo, and there are several issues with it. First, it uses Wikidata, as there is no Structured Data on Commons yet. That limits it to files used in Wikidata items, and to the property schema and tree structure used on Wikidata. Some links that are offered lead to ridiculously large queries (all items that are an instance of a subclass of “entity”, anyone?), some that just return the same file you came from (because it is the only item with an image created by Painter X), and some that look useful but time out anyway. And, as it is, the way I query the APIs would likely not be sustainable for use by everyone by default.

But then, this is what a single guy can hack in a few hours, using a “foreign” database that was never intended to make browsing files easy. Given these limitations, I think about what the community can do with a bespoke, for-purpose Structured Data, and some well-designed code, and I am very hopeful.

Note: Please feel free to work with the JS code; it also contains my attempt to show the results in a dialog box on the File Page, but I couldn’t get it to look nice, so I keep using external links.

2 Comments

  1. This is an exciting demo. Because I’m not a programmer, I wonder how you’ve got the coordinates for these two items: https://commons.wikimedia.org/wiki/File:The_Nightwatch_by_Rembrandt.jpg (I assume via the Wikidata link in “Current location”?) and https://commons.wikimedia.org/wiki/File:London_Eye_-_TQ04_26.jpg (I assume via the linked https://commons.wikimedia.org/wiki/London_Eye which has a link to London Eye (Q160659)?)?

    Friday, November 2, 2018 at 15:53 | Permalink
  2. Magnus wrote:

    The first step is to get Wikidata items that use that image, via:
    https://commons.wikimedia.org/w/api.php?action=help&modules=query%2Bglobalusage
    That’s roughly the same as the global usage section on the file page.

    Once I have the item, I use:
    https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities

    For the London Eye item, that would be:
    https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q160659

    If you search for “P625” on that result, you’ll see the coordinates for the item, which I then use in the demo.

    Monday, November 5, 2018 at 09:41 | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*