Skip to content

A week of looking at women

Images and their use in the WikiVerse have always been a particular interest of mine, on Wikipedia, Commons, and of course, Wikidata. Commons holds the files and groups them by subject, author, or theme; Wikidata references images and files for key aspects of a subject; and Wikipedia uses them to enrich texts, and puts files into context.

Wikidata uses images for more subjects than any Wikipedia, save English, and it is slowly encroaching on the latter; the “break-even” should happen later this year. This is not just a purpose in itself, but will also massively benefit the many smaller Wikipedias, by holding such material in easily usable form at the ready.

Screen Shot 2016-04-07 at 00.10.26

Image candidates, ready to be added with a single click

So I did a small experiment, as to how much one person can do “on the side” (besides work, other interests, and such luxuries as sleeping or eating), to improve the Wikidata image fundus. I thus picked the German category for women, which currently holds >92K articles. I used my WD-FIST tool to find all potential images on all Wikipedias, for the Wikidata items corresponding to the German articles. This does not show items that already have an image, or items that have no possible candidate image anywhere; just the ones where a Wikipedia does have an image, and Wikidata does not.

A week ago, I started with 3,060 items of women that potentially had an image on Wikipedia, somewhere. A week later, I am down to ~290. Now, that does not mean I added ~2,700 images to Wikidata; a database query comes to about ~1,100 added images, and ~200 other file properties (spoken text, commemorative plaque image, etc.). Some items just had no suitable image on Wikipedia; others had group photos, which I tagged to be cropped on Commons (those tagged images will not show in the tool, while the crop template remains).

The image candidates for the remaining 290 or so items need to be investigated in more detail; some of them might be not actually images of the subject (hard to tell if the file name and description are in e.g. Russian), or they are low-resolution group pictures, which do not warrant cropping, as the resulting, individual image would be too grainy.

Adding the ~1,100 images is good, but only part of the point I am trying to make here. The other part is, no one will have to wade again through the ~90% of item/image suggestions I have resolved, one way or another. Ideally, the remaining 290 items should be resolved to, so if an image is added on any Wikipedia, for any of the >92K women in the category, just that new image would show in the tool, which would make updating Wikidata so much easier. Even just one volunteer could drop by every few weeks and keep Wikidata up-to-date with images, for that group of items, with a few clicks’ worth.

The next step is, of course, all women on Wikidata (caution: that one will load a few minutes). The count of items with potential images is at 15,986 at the time of writing. At my speed, it would take one person about a month of late evening clicking to reduce that by 90%, though I do hope some of you have been inspired to help me out a bit.