Skip to content

ORCID mania

ORCID is an increasingly popular service to disambiguate authors of scientific publications. Many journals and funding bodies require authors to register their ORCID ID these days. Wikidata has a property for ORCID, however, only ~2400 items have an ORCID property at the moment of writing this blog post. That is not a lot, considering Wikidata contains 728,112 scientific articles.

Part of the problem is that it is not easy to get ORCIDs and its connections to publications in an automated fashion. It appears that several databases, public or partially public, contain parts of the puzzle that is required for determining the ORCID for a given Wikidata author.

So I had a quick look, and found that, on the ORCID web site, one can search for a publication DOI, and retrieve the list of authors in the ORCID system that “claim” that DOI. That author list contains variations on author names (“John”, “Doe”, “John Doe”, “John X. Doe” etc.) and their ORCID IDs. Likewise, I can query Wikidata for a DOI, and get an item about that publication; that item contains statements with authors that have an item (“P50”). Each of these authors has a name.

Now, we have two lists of authors (one from ORCID, one from Wikidata), both reasonably short (say, twenty entries each), that should overlap to some degree, and they are both lists of authors for the same publication. They can now be joined via name variations, excluding multiple hits (there may be two “John Doe”s in the author list of a publication; this happens a lot with Asian names), as well as excluding authors that already have an ORCID ID on Wikidata.

I have written a bot that will take random DOIs from Wikidata, query them in ORCID, and compare the author list. In a first run, 5.000 random DOIs yielded 123 new ORCID connections; manual sampling of the matches looked quite good, so I am adding them via QuickStatements (sample of edits).

Unless this meets with “social resistance”, I can have the bot perform these edits regularly, which would keep Wikidata up-to-date with ORCIDs.

Additionally, there is a “author name string” property, which stores just the author name for now, for authors that do not have an item yet. If the ORCID list matches one of these names, an item could automatically be created for that author, including ORDIC ID, and association to the publication item. Please let me know if this would be desirable.

Comprende!

tl;dr: I wrote a quiz interface on top of a MediaWiki/WikiBase installation. It ties together material from Wikidata, Commons, and Wikipedia, to form a new educational resource. I hope the code will eventually be taken up by a Wikimedia chapter, as part of an OER strategy.


The past

There have been many attempts in the WikiVerse to get a foot into the education domain. Wikipedia is used extensively in this domain, but it is more useful for introductions to a topic, and as a reference, rather than a learning tool. Wikiversity was an attempt to get into university-level education, but even I do not know anyone who actually uses it. Wikibooks has more and better contents, but many wikibooks are mere sub-stub equivalents, rather than usable, fully-fledged textbooks. There has been much talk about OER, offline content for internet-challenged areas, etc. But the fabled “killer app” has so far failed to emerge.

Enter Charles Matthews, who, like myself, is situated in Cambridge. Among other things, he organises the Cambridge Wikipedia meetup, and we do meet occasionally for coffee between those. In 2014, he started talking to me about quizzes. At the time, he was designing teaching material for Wikimedia UK, using Moodle, as a component in Wikipedia-related courses. He quickly became aware of the limitations of that software, which include (but are not limited to) general software bloat, significant hardware requirements, and hurdles in re-using questions and quizzes in other contexts. Despite all this, Moodle is rather widely used, and the MediaWiki Quiz extension is not exactly representing itself as a viable replacement.

A quiz can be a powerful tool for education. It can be used by teachers and mentors to check on the progress of their students, and by the students themselves, to check their own progress and readiness for an upcoming test.

As the benefits are obvious, and the technical requirements appeared rather low, I wrote (at least) two versions of a proof-of-concept tool named wikisoba. The interface looked somewhat appealing, but storage is a sore point. The latest version uses JSON stored as a wiki page, which needs to be edited manually. Clearly, not an ideal way to attract users these days.

Eventually, a new thought emerged. A quiz is a collection of “pages” or “slides”, representing a question (of various types), or maybe a text to read beforehand. A question, in turn, consists of a title, a question text (usually), possible answers, etc. A question is therefore the main “unit”, and should be treated on its own, separate from other questions. Questions can then be bundled into quizzes; this allows for re-use of questions in multiple quizzes, maybe awarding different points (a question could yield high points in an entry-level quiz, but less points in an advanced quiz). The separation of question and quiz makes for a modular, scalable, reusable architecture. Treating each question as a separate unit is therefore a cornerstone of any successful system for (self-)teaching and (self-)evaluation.

It would, of course, be possible to set up a database for this, but then it would require an interface, constraint checking, all the things that make a project complicated and prone to fail. Luckily, there exists a software that already offers adequate storage, querying, interface etc. I speak of WikiBase, the MediaWiki extension used to power Wikidata (and soon Commons as well). Each question could be an item, with the details encoded in statements. Likewise, a quiz would be an item, referencing question items. WikiBase offers a powerful API to manage, import, and export questions; it comes with build-in openness.

The present

There is a small problem, however; the default WikiBase interface is not exactly appealing for non-geeks. Also, there is obviously no way to “play” a quiz in a reasonable manner. So I decided to use my recent experience with vue.js to write an alternative interface to MediaWiki/WikiBase, designed to generate questions and quizzes, and to play a quiz in a more pleasant way. The result has the working title Comprende!, and can be regarded as a fully functional, initial version of a WikiBase-driven question/quiz system. The underlying “vanilla” WikiBase installation is also accessible. To jump right in, you can test your biology knowledge!

There are currently three question types available:

  • Multiple-choice questions, the classic
  • “Label image” presents an image from Commons, letting you assign labels to marked points in the image
  • Info panels, presenting information to learn (to be interspersed with actual questions)

All aspects of the questions are stored in WikiBase; they can have a title, a short text, and an intro section; for the moment, the latter can be a specific section of a Wikipedia article (of a specific revision, by default), but other types (Commons images, for example) are possible. When used in “info panel” type questions (example), a lot of markup, including images, is preserved; for intro sections in other question types, it is simplified to mere text.

Live translating of interface text.

Wikidata is multi-lingual by design, and so is Comprende!. An answer or image label can be a text stored as multi-lingual (or monolingual, in WikiBase nomenclature) strings, as a Wikidata item reference, giving instant access to all the translations there. Also, all interface text is stored in an item, and translations can be done live within the interface.

Questions can be grouped and ordered into a quiz. Everyone can “play” and design a quiz (Chrome works best at the moment), but you need to be logged into the WikiBase setup to save the result. Answers can be added, dragged around to change the order, and each question can be assigned a number of points, which will be awarded based on the correct “sub-answers”. You can print the current quiz design (no need to save it), and most of the “chrome” will disappear, leaving only the questions; instant old-fashioned paper test!

While playing the quiz, one can see how many points they have, how many questions are left etc. Some mobile optimisations like reflow for portrait mode, and a fixed “next question” button at the bottom, are in place. At the end of the quiz, there is a final screen, presenting the user with their quiz result.

To demonstrate the compatibility with existing question/quiz systems, I added a rudimentary Moodle XML import; an example quiz is available. Another obvious import format to add would be GIFT. Moodle XML export is also on the to-do-list.

The future

All this is obviously just a start. A “killer feature” would be a SPARQL setup, federating Wikidata. Entry-level quizzes for molecular biology? Questions that use Wikidata answers that are chemicals? I can see educators flocking to this, especially if material is available in, or easily translated into, their language. More questions types could emphasise the strength of this approach. Questions could even be mini-games etc.

Another aspect I have not worked on yet is logging results. This could be done per user, where the user can add their result in a quiz to a dedicated tracking item for their user name. Likewise, a quiz could record user results (automatically or voluntarily).

One possibility would be to live for the questions, quizzes etc. in a dedicated namespace on Wikidata (so as to not contaminate the default namespace). That would simplify the SPARQL setup, and get the existing community involved. The Wikitionary-related changes on Wikidata will cover all that is needed on the backend; the interface is all HTML/JS, not even an extension is required, so next to no security or integration issues. Ah, one can dream, right?

Mix’n’match interface update

I have been looking into a JavaScript library called vue.js lately. It is similar to React, but not encumbered by licensing issues (that might prevent its use on WMF servers in the future), faster (or so they claim), but most of all, it can work without interference on the server side; all I need for my purposes is including the vue.js file into HTML.

So why would you care? Well, as usual, I learn new technology by working it into an actual project (rather than just vigorously nodding over a manual). This time, I decided to rewrite the slightly dusty interface of Mix’n’match using vue.js. This new version went “live” a few minutes ago, and I am surprised myself at how much more responsive it has become. This might be best exemplified by the single entry view (example), which (for unmatched entries) will search Wikidata, the respective language Wikipedia, and the Mix’n’match database for the entry title. It also searches Wikidata via SPARQL to check if the ID for the respective property is already in use. This all happens nicely modular, so I can re-use lots of code for different modules.

Most of the functions in the previous version have been implemented in the new one. Redirect code is in place, so if you have bookmarked a page on Mix’n’match, you should end up in the right place. One new function is the ability to sort and group the catalogs (almost 400 now!) on the main page (example).

As usual, feel free to browse the code (vue.js-based HTML and JavaScript, respectively). Issues (for the new interface, or Mix’n’match in general) go here.

Mix’n’match post-mortem

So this, as they say, happened.

On 2016-12-27, I received an update on a Mix’n’match catalog that someone had uploaded. That update had improved names and descriptions for the catalog. I try to avoid such updates, because I made the import function so I do not have to deal with every catalog myself, and also because the update process is entirely manual, therefore somewhat painful and error-prone, as we will see. Now, as I was on vacation, I was naturally in a hurry, and (as it turned out later) there were too many tabs in the tab-delimited update file.

Long story short, something went wrong with the update. For some reason, some of the SQL commands I generated from the update file did not specify some details about which entry to update. Like, its ID, or the catalog. So when I checked what was taking so long, just short of 100% of Mix’n’match entries had the label “Kelvinator stove fault codes”, and the description “0”.

Backups, you say? Well, of course, but, look over there! /me runs for the hills

Well, not all was lost. Some of the large catalogs were still around from my original import. Also, my scraping scripts for specific catalogs generate JSON files with the data to import, and those are still around as well. There was also a SQL dump from 2015. That was a start.

Of course, I did not keep the catalogs imported through my web tool. Because they were safely stored in the database, you know? What could possibly go wrong? Thankfully, some people still had their original files around and gave them to me for updating the labels.

I also wrote a “re-scraping” script, which uses the external URLs I store for each entry in Mix’n’match, together with the external ID. Essentially, I get the respective web page, and write a few lines of code to parse the <title> tag, which often includes the label. This works for most catalogs.

So, at the time of writing, over 82% of labels in Mix’n’match have been successfully restored. That’s the good news.

The bad news is that the remaining ~17% are distributed across 133 catalogs. Some of these do not have URLs to scrape, some URLs don’t play nicely (session-based Java horrors, JS-only pages etc.), and the rest need site-specific <title> scraping code. Fixing those will take some time.

Apart from that, I fixed up a few things:

  • Database snapshots (SQL dump) will now be taken once a week
  • The snapshot from the previous week is preserved as well, in case damage went unnoticed
  • Catalogs that are uploaded through the import tool will be preserved as individual files

Other than the remaining entries that require fixing, Mix’n’match is open for business, and while my one-man-show is spread thin as usual, subsequent blunders should be easier to mitigate. Apologies for the inconvenience, and all that.

All your locations are belong to us

A recent push for a UK photography contest reminded me of an issue I have begrudged for a quite a while. On the talk page for that contest, I pointed to several tools of mine, dealing with images and locations. But they only show aspects of those, like “Wikidata items without images”. What about the others? WDQS can show maps of all Wikidata items in a region, but what about Wikipedia? The mobile app can show you things with Wikipedia articles nearby, but what about Commons? I don’t recall a way to see Commons images taken near a location (WD-FIST can find them, but without a map). The data exists, but is either hard to get to, or “siloed” in some tool/app.

Screen Shot 2016-08-20 at 18.06.26Wouldn’t it be great to get a map with all this information on it? All of Wikidata? All of Wikipedia? All of Commons? At once?

What should that look like? The photography contest scenario, and change in general web usage patterns, suggest a strong emphasis on mobile. Which in turn tends to be “no frills”, as in, a focus on what is important: The map, and the objects on it.

So I decided (for the time being) to get rid of the query functions in WD-FIST, and the clutter in WikiShootMe, and start from scratch, with (essentially) just a big map, using the bleeding-edge versions of JS libraries like bootstrap, jQuery, and leaflet. So without further ado, I present WikiShootMe, version 3 (pre-alpha). As it is, the tool defaults to your coordinates, which may be your local hub (as in my case, in the screenshot). There are four layers, which can be individually toggled:

  1. Wikidata items with images (in green)
  2. Wikidata items without image (in red, the Wikipedia will change with your language selection)
  3. Commons images (in blue)
  4. Wikipedia articles (smaller, in yellow, mostly overlapping Wikidata items)

There is also a grey circle in the center, which is your (or your local hub’s) position. On mobile, this should move with you (but I haven’t tested that, as it would require leaving the house). All of these have a pop-up, when you click or touch the circle. It shows the linked title of the object, and, for Wikidata items with images and Commons images, it shows the respective image.

All these data sources will update when you move the map, as well as zoom, up to a certain zoom factor. Below that, an “Update” button will appear to update manually, but it can take a long time, even with the number of objects limited.

Screen Shot 2016-08-20 at 18.46.05I find it amazing how many geo-coded images there are already on Commons (even though the API will only give me 500 at a time). Maybe that is the geograph effect here in the UK, which let to the import of hundreds of thousands of free images to Commons.But I also found a funny pattern in Cologne, Germany, which turned out to be a series of images taken by Wikimedia volunteers from a balloon!

Now, to be extra clever, I tried to add an upload function to the pop-up of Wikidata items without an image. You can select a file from disk, or use the camera as a source on mobile. It will pre-fill the title and the {{Information}} template with a link to the respective Wikidata object. However, several problems occur with that:

  • I only could get the “old” Commons upload page to work with the pre-filled data
  • I could find no documentation on <form> parameters for the Upload wizard
  • I haven’t actually tested if the upload works
  • There seems to be no way to automatically add the uploaded image to the Wikidata item

A way around all that would be to upload the image to the tool itself, then transfer it to Commons via OAuth. This would also allow me to add the new image as the P18 on the Wikidata item. This is an option to be explored, especially if the Upload Wizard remains opaque to me.

Update: I have added OAuth to the tool. Once authorised, you can upload a new image for a Wikidata item from both desktop and mobile (gallery or camera directly) with one click. It fills in file name, coordinates, default license etc. It even adds the image to the item after upload automatically. All this opens in a new tab, on the page for the uploaded image, to give you a chance to add more information.

As usual, I am quite open for bug reports, feature requests (yes, it’s bare-bones at the moment), and technical support by volunteers/WMF.

Livin’ on the edge

A few days ago, Lydia posted about the first prototype of the new structured data system for Commons, based on Wikidata technology. While this is just a first step, structured data for Commons seems finally within reach.

And that brings home the reality of over 32 million files on Commons, all having unstructured data about them, in the shape of the file description pages. It would be an enormous task do manually transcribe all these descriptions, licenses, etc. to the appropriate data structures. And while we will have to do just that for many of the files, the ones that can be transcribed by a machine, should be.

So I went ahead and re-wrote a prototype tool I had build for just this occasion a while ago. I call it CommonsEdge (a play on Common sedge). It is both an API, and an interface to that API. It will parse a file description page on Commons, and return a JSON object with the data elements corresponding to the description page. An important detail is that this parser does not just pick some elements it understands, and ignore the rest; internally, it tries to “explain” all elements of the description (templates, links, categories, etc.) as data, and fails if it can not explain one. That’s right, the API call will fail with an error, unless 100% of the page would be represented in the JSON object returned. This prevents “half-parsed” pages; a file description page that is successfully pared by the API can safely be replaced in its entirety by the resulting structured data. In case of failure, the error message is usually quite specific and detailed about the cause; this allows for incremental improvements of the parser.

Screen Shot 2016-08-03 at 21.35.19At the moment of writing, I find that ~50-60% of file descriptions (based on sets of 1000 random files) produce a JSON object, that is, can be completely understood by the parser, and completely represented in the result. That’s 16-19 million files descriptions that can be converted to structured data automatically, today. Most of the failures appear to be due to bespoke templates; the more common ones can be added over time.

A word about the output: Since the structured data setup, including properties and foreign keys, is still in flux, I opted for a simple output format. It is not Wikibase format, but similar; most elements (except categories and coordinates, I think) are just lists of type-and-value tuples (example). I try to use URLs as much as possible, for example, when referencing users on Commons (or other Wikimedia projects) or flickr. Licenses are currently links to the Wikidata element corresponding to the used template (ideally, I would like to resolve that through Wikidata properties pointing to the appropriate license).

Source code is available. Pull requests are welcome.

WDQ, obsolete?

Since a few years, I run the WikiData Query tool (WDQ) to provide a query functionality to Wikidata. Nowadays, the (confusingly similarly named) SPARQL-based WDQS is the “official” way to query Wikidata. WDQS has been improving a lot, and while some of my tools still support WDQ, I deliberately left that option out of new tools like PetScan. But before I shut down WDQ, and the tools that use it, for good, I wanted to know if it is still used, and if SPARQL could take over.

I therefore added a query logger to Autolist1 and Autolist2. The logs contain all WDQ queries run through those tools. I will monitor the results for a while, but here is what I saw so far. I will comment on translating the query to SPARQL using WDQ2SPARQL, the general ability for such queries, and the performance of WDQS. “OK” means the query could be converted automatically to SPARQL, runs, and produces a similar (as in, equal or more up-to-date) result.

WDQ Comment
CLAIM[279:13219666]  OK
BETWEEN[569,1016-1,1016-12] BETWEEN not implemented in WDQS, but manual translation feasible

Update: This has been implemented by smalyshev no, runs OK!

(CLAIM[1435:10387684] OR CLAIM[1435:10387575]) AND NOCLAIM[380] AND NOCLAIM[481]  OK
BETWEEN[569,1359-1,1359-12] BETWEEN not implemented in WDQS, but manual translation feasible

Update: This has been implemented by smalyshev no, runs OK!

CLAIM[31:5]  All humans ~3.2M humans on Wikidata. Not really a useful query in these tools.
Q22686  Single item. Doesn’t really need a query?
Q22686  Single item. Doesn’t really need a query?
CLAIM[106:170790] AND CLAIM[27:35]  OK
CLAIM[195:842858]  OK
Gustav III  What the hell?
claim[17]  All items with “country”. Not really a useful query in these tools.
claim[31]  All items with “instance of”. Not really a useful query in these tools.
claim[106:82955] and claim[509:(tree[12078][][279])]  OK
claim[31:5]   All humans ~3.2M humans on Wikidata. Not really a useful query in these tools.
claim[31:5]   All humans ~3.2M humans on Wikidata. Not really a useful query in these tools.
claim[21]  All items with gender. Not really a useful query in these tools.
LINK[lvwiki] AND CLAIM[31:5]  OK
LINK[lvwiki] AND CLAIM[31:5]  OK
claim[27] and noclaim[21]  OK
LINK[lvwiki] AND CLAIM[31:56061]  OK
LINK[lvwiki] AND tree[56061][150][17,279]  OK
claim[31:(tree[16521][][279])]  OK

As far as I can tell, SPARQL could take over for WDQ immediately.

i19n

Wikipedia has language editions, Wikidata has labels, aliases, descriptions, and some properties in multiple languages. This a great resource, to get the world’s knowledge in your language! But looking at the technical site, things become a little dim. Wikimedia sites have their interface translated in many languages, but beyond that, English rules supreme. Despite many requests, only few tools on Labs have a translatable (and translated) interface.

One exception is PetScan, which uses the i18n mechanism from its predecessor CatScan, namely a single wiki page on meta, which contains all interface translations. This works in principle, as the many translations there show, but it has several disadvantages, ranging from bespoke wikitext parsing, over load/rendering times on meta, to the fact that there is no easy way to answer the question “which of these keys have not been translated into Italian”? New software features require new interface strings, so the situation gets worse over time.

The answer I got when asking about good ways to translate interfaces is usually “just use TranslateWiki“, which IIRC is used for the official Wikimedia sites. This is a great project, with powerful applications, but I was looking for something more light-weight, both on the “add a translation” side, and the “how to use this in my tool” side.

ToolTranslateIf you know me or my blog, then by this point, you will already have guessed what happened next: I rolled my own (for more detailed information, see the manual page).

ToolTranslate is a tool that allows everyone (after the usual OAuth ceremony) to provide translations for interface texts, in almost 300 languages. I even made a video demonstrating how easy it is to add translations (ToolTranslate uses its own mechanism, so the demo edit shows up live in the interface). You can even also your own tool, without having to jump through bureaucratic hurdles, just with the press of a button!

On the tool-author side, you will have to change your HTML, from <div>My text</div> to <div tt=”mytext”></div>, and then add “My text” as a translation for the “mytext” key. Just use the language(s) you know, anyone can add translations in other languages later.I experienced this myself; after I uploaded the demo video, User:Geraki added Greek translations to the interface, before this blog post, or any other instructions, were available. Just, suddenly, as if by magic, Greek appeared as an interface option… You will also need to include a JavaScript file I provide, and add a single line of code (two, if you want to have a drop-down to switch languages live).

There is a simplistic demo page, mainly intended for tool authors, to see how it works in practice. Besides ToolTranslate itself, I also used it on my WikiLovesMonuments tool, to show that it is feasible to retrofit an existing tool. This took less than 10 minutes.

I do provide the necessary JavaScript code to convert HTML/JS-based tools. I will be working on a PHP class next, if there is demand. All translations are also provided as JSON files online, so you can, in turn, “roll your own” code if you want. And if you have existing translations for your tool and want to switch to ToolTranslate, let me know, and I can import your existing translations.

First image, good image?

For a while now, Wikimedia pages (usually, Wikipedia articles) have a “page image”, an image from that page used as a thumbnail in article previews, e.g. in the mobile app. While it is not entirely clear to me how this is image is chosen, it appears to be the first image of the article in most cases, probably excluding some icons.

Wikidata is doing something similar with the “image” property (P18), however, this needs to be an image of the item’s subject, not “something related to the item”. Wikipedia’s “page image” often turns out to be a painting made by the article’s subject, or a map, or something related to an event. This discrepancy prevent an automated import of the “page image” into Wikidata. However, exceptions aside, the “page item” presents a highly specific resource for P18-suitable images.

Screen Shot 2016-07-18 at 10.38.02So I added a new function to my WD_FIST tool, to help facilitate the import of suitable images from that rich source into Wikidata. As a first step, a bot checks several large Wikipedias on a daily basis, and retrieves “page images” where the associated Wikidata item has none, and the “page image” is stored on Commons. It also skips “non-subject” pages like list articles. In a second stage, images (excluding PNG, GIF, and SVG) that are used as a “page image” on at least three Wikipedias for the same subject are put into a main candidate list. The image must also not be on the tool-internal “ignore” list. Even after all this filtering, >32K candidates remain in the current list.

dewiki 346,204
enwiki 700,832
frwiki 255,527
itwiki 148,041
nowiki 73,508
plwiki 181,323
svwiki 109,349
Combined 32,137

I will likely add more Wikipedias to this list (es and pt will show up tomorrow), and eventually lower the inclusion threshold, as candidates are added to Wikidata, or to the “ignore” list.

As the candidate list is already heavily filtered, I am not applying some of the usual WD-FIST filters. This also helps with retrieving a candidate set of 50 very quickly. In this mode, the tool also lends itself well to mobile usage.

A week of looking at women

Images and their use in the WikiVerse have always been a particular interest of mine, on Wikipedia, Commons, and of course, Wikidata. Commons holds the files and groups them by subject, author, or theme; Wikidata references images and files for key aspects of a subject; and Wikipedia uses them to enrich texts, and puts files into context.

Wikidata uses images for more subjects than any Wikipedia, save English, and it is slowly encroaching on the latter; the “break-even” should happen later this year. This is not just a purpose in itself, but will also massively benefit the many smaller Wikipedias, by holding such material in easily usable form at the ready.

Screen Shot 2016-04-07 at 00.10.26

Image candidates, ready to be added with a single click

So I did a small experiment, as to how much one person can do “on the side” (besides work, other interests, and such luxuries as sleeping or eating), to improve the Wikidata image fundus. I thus picked the German category for women, which currently holds >92K articles. I used my WD-FIST tool to find all potential images on all Wikipedias, for the Wikidata items corresponding to the German articles. This does not show items that already have an image, or items that have no possible candidate image anywhere; just the ones where a Wikipedia does have an image, and Wikidata does not.

A week ago, I started with 3,060 items of women that potentially had an image on Wikipedia, somewhere. A week later, I am down to ~290. Now, that does not mean I added ~2,700 images to Wikidata; a database query comes to about ~1,100 added images, and ~200 other file properties (spoken text, commemorative plaque image, etc.). Some items just had no suitable image on Wikipedia; others had group photos, which I tagged to be cropped on Commons (those tagged images will not show in the tool, while the crop template remains).

The image candidates for the remaining 290 or so items need to be investigated in more detail; some of them might be not actually images of the subject (hard to tell if the file name and description are in e.g. Russian), or they are low-resolution group pictures, which do not warrant cropping, as the resulting, individual image would be too grainy.

Adding the ~1,100 images is good, but only part of the point I am trying to make here. The other part is, no one will have to wade again through the ~90% of item/image suggestions I have resolved, one way or another. Ideally, the remaining 290 items should be resolved to, so if an image is added on any Wikipedia, for any of the >92K women in the category, just that new image would show in the tool, which would make updating Wikidata so much easier. Even just one volunteer could drop by every few weeks and keep Wikidata up-to-date with images, for that group of items, with a few clicks’ worth.

The next step is, of course, all women on Wikidata (caution: that one will load a few minutes). The count of items with potential images is at 15,986 at the time of writing. At my speed, it would take one person about a month of late evening clicking to reduce that by 90%, though I do hope some of you have been inspired to help me out a bit.