Skip to content

Of cats and pets

CatScan is one of these workhorse tools that are familiar to many Wikimedia users, all the way back to the toolserver. Its popularity, however, has also caused problems with reliability time and again. As Labs became usable, I added QuickIntersection to the mix, allowing for a quicker and more reliable service at the expense of some complex functionality. Alas, despite my best efforts, CatScan reliability is fluctuating a lot. The reasons for that include the choice of PHP as a programming language, and the shared nature of Labs tools, where resources are concerned.

So I spent the last two weeks (as time allowed) with a complete rewrite of the tool, using C++ and a dedicated virtual machine on Labs. The result is one of the most complex tools I developed to date.I call it PetScan, both to indicate that it does more than just cat(egorie)s, and as a pun on the more versatile PET scan (compared to the CAT scan).

Its basic interface is based on CatScan, and it is backwards-compatible for both URL parameters (so if you have a CatScan URL, you just need to replace the server name) and output (so the JSON output will be almost identical). It can also be switched to QuickIntersection output with a parameter, so it could replace that tool as well.

But PetScan is much more encompassing. Several times before, I tried to “connect” my (and other) tools, the last time via PagePile; however, the uptake was rather low. It is clear that most users prefer a tool that slices and dices. This is why PetScan can also process other data sources, like the Wikidata SPARQL query, manual page or item lists, and yes, PagePile. Given more than one source, it builds a subset of the respective results, even if they are on other wikis (via Wikidata).

You want a list of all cats known to Wikidata that are also in the category tree of the battleship “Bismarck” on English Wikipedia? No problem. You can chose which of the input wikis should be the output wiki, so you can have the same result as Wikidata items. Now, for the latter, you might have seen an additional box at the top of the results; this is the full functionality from AutoList 2, directly available on your resulting items.

Additional goodies include:

  • Interface language can be switched “live”. The translations were copied from the CatScan translations, so that effort is re-used.
  • Namespaces are updated live when you change the wiki for the categories
  • Both templates and (incoming) links can now be a primary source, instead of just being filters for categories
  • You can filter the results by a regular expression. This works on page titles, or Wikidata labels, respectively
  • For Wikidata results, you can specify the label language used (will default to the interface language)
  • Show only Wikipedia pages without a Wikidata item
  • Only the first 10K results will be shown in HTML mode, as to not crash your browser. Others (e.g. JSON) will get you all results

I have tested PetScan on my own, but with a project of this complexity, bugs will only become apparent with many users, over time, so please help testing it. Eventually, I believe this tool can (and will) replace CatScan, QuickIntersections, Autolist, and maybe others as well.

6 Comments

  1. Edo de Roo wrote:

    Thanks for this tool … I use it already to generate a list of items on nl-wiki that still need a wikidata item (and are not on the delete lists) … grouped by type/infobox/category … and I hope to add a suggestion on already existing wikidata items from other lanuguages.

    But one thing that is right now not available, but might be a valuable extra (if possible): find items that have less then x properties on Wikidata … in a category … or even miss a certain property like country in certain categories.

    But so for I’m already happy with this MEGA-tool that combines all the other tools

    Tuesday, March 29, 2016 at 20:46 | Permalink
  2. Flora wrote:

    little bug with wikisource links –

    https://fr.wikiso.org/wiki/Un_J%C3%A9r%C3%B4me_Bosch_inconnu instead of https://fr.wikisource.org/wiki/Un_J%C3%A9r%C3%B4me_Bosch_inconnu

    😉

    Wednesday, March 30, 2016 at 16:04 | Permalink
  3. Flora wrote:

    Seems to be a really beautiful tool… will test it thoroughly this evening, on wsfr x wd 🙂

    Wednesday, March 30, 2016 at 16:05 | Permalink
  4. Magnus wrote:

    @Flora what was your query? Seems to work for manual list:
    http://petscan.wmflabs.org/?language=en&project=wikipedia&ns%5B0%5D=1&manual_list=Un_J%C3%A9r%C3%B4me_Bosch_inconnu&manual_list_wiki=frwikisourcewiki&common_wiki=manual&interface_language=en&active_tab=tab_other_sources&doit=

    Wednesday, March 30, 2016 at 16:13 | Permalink
  5. Edo de Roo wrote:

    My tool to find nl-wiki articles without wikidata is now getting even better…

    But I’ve been told that petscan is not running on iOS/Safari. Is this a known issue? I run Win10 myself, so can’t test it….

    Wednesday, March 30, 2016 at 22:09 | Permalink
  6. Magnus wrote:

    Fixed the Safari issue, AFAIK.

    Thursday, March 31, 2016 at 09:29 | Permalink