For a while now, Wikimedia pages (usually, Wikipedia articles) have a “page image”, an image from that page used as a thumbnail in article previews, e.g. in the mobile app. While it is not entirely clear to me how this is image is chosen, it appears to be the first image of the article in most cases, probably excluding some icons.
Wikidata is doing something similar with the “image” property (P18), however, this needs to be an image of the item’s subject, not “something related to the item”. Wikipedia’s “page image” often turns out to be a painting made by the article’s subject, or a map, or something related to an event. This discrepancy prevent an automated import of the “page image” into Wikidata. However, exceptions aside, the “page item” presents a highly specific resource for P18-suitable images.
So I added a new function to my WD_FIST tool, to help facilitate the import of suitable images from that rich source into Wikidata. As a first step, a bot checks several large Wikipedias on a daily basis, and retrieves “page images” where the associated Wikidata item has none, and the “page image” is stored on Commons. It also skips “non-subject” pages like list articles. In a second stage, images (excluding PNG, GIF, and SVG) that are used as a “page image” on at least three Wikipedias for the same subject are put into a main candidate list. The image must also not be on the tool-internal “ignore” list. Even after all this filtering, >32K candidates remain in the current list.
I will likely add more Wikipedias to this list (es and pt will show up tomorrow), and eventually lower the inclusion threshold, as candidates are added to Wikidata, or to the “ignore” list.
As the candidate list is already heavily filtered, I am not applying some of the usual WD-FIST filters. This also helps with retrieving a candidate set of 50 very quickly. In this mode, the tool also lends itself well to mobile usage.