Skip to content

Of cats and pets

CatScan is one of these workhorse tools that are familiar to many Wikimedia users, all the way back to the toolserver. Its popularity, however, has also caused problems with reliability time and again. As Labs became usable, I added QuickIntersection to the mix, allowing for a quicker and more reliable service at the expense of some complex functionality. Alas, despite my best efforts, CatScan reliability is fluctuating a lot. The reasons for that include the choice of PHP as a programming language, and the shared nature of Labs tools, where resources are concerned.

So I spent the last two weeks (as time allowed) with a complete rewrite of the tool, using C++ and a dedicated virtual machine on Labs. The result is one of the most complex tools I developed to date.I call it PetScan, both to indicate that it does more than just cat(egorie)s, and as a pun on the more versatile PET scan (compared to the CAT scan).

Its basic interface is based on CatScan, and it is backwards-compatible for both URL parameters (so if you have a CatScan URL, you just need to replace the server name) and output (so the JSON output will be almost identical). It can also be switched to QuickIntersection output with a parameter, so it could replace that tool as well.

But PetScan is much more encompassing. Several times before, I tried to “connect” my (and other) tools, the last time via PagePile; however, the uptake was rather low. It is clear that most users prefer a tool that slices and dices. This is why PetScan can also process other data sources, like the Wikidata SPARQL query, manual page or item lists, and yes, PagePile. Given more than one source, it builds a subset of the respective results, even if they are on other wikis (via Wikidata).

You want a list of all cats known to Wikidata that are also in the category tree of the battleship “Bismarck” on English Wikipedia? No problem. You can chose which of the input wikis should be the output wiki, so you can have the same result as Wikidata items. Now, for the latter, you might have seen an additional box at the top of the results; this is the full functionality from AutoList 2, directly available on your resulting items.

Additional goodies include:

  • Interface language can be switched “live”. The translations were copied from the CatScan translations, so that effort is re-used.
  • Namespaces are updated live when you change the wiki for the categories
  • Both templates and (incoming) links can now be a primary source, instead of just being filters for categories
  • You can filter the results by a regular expression. This works on page titles, or Wikidata labels, respectively
  • For Wikidata results, you can specify the label language used (will default to the interface language)
  • Show only Wikipedia pages without a Wikidata item
  • Only the first 10K results will be shown in HTML mode, as to not crash your browser. Others (e.g. JSON) will get you all results

I have tested PetScan on my own, but with a project of this complexity, bugs will only become apparent with many users, over time, so please help testing it. Eventually, I believe this tool can (and will) replace CatScan, QuickIntersections, Autolist, and maybe others as well.