One of my most-used WikiVerse tools is PetScan. It is a complete re-write of several other PHP-based tools, in C++ for performance reasons. PetScan has turned into the Swiss Army Knife of doing things with Wikipedia, Wikidata, and other projects.
But PetScan has also developed a few issues over time. It is suffering from the per-tool database connection limit of 10, enforced by the WMF. It also has some strange bugs, one of them creating weirdly named files on disk, which generally does not inspire confidence. Finally, from a development/support POV, it is the “odd man out”, as none of my other WikiVerse tools are written in C++.
So I went ahead and re-wrote PetScan, this time in Rust. If you read this blog, you’ll know that Rust is my recent go-to language. It is fast, safe, and comes with a nice collection of community-maintained libraries (called “crates”). The new PetScan:
- uses MediaWiki and Wikibase crates, which simplifies coding considerably
- automatically chunks database queries, which should improve reliability, and could be developed into multi-threaded queries
- pools replica database access form several of my other HTML/JS-only tools (which do not use the database, but still get allocated connections)
Long story short, I want to replace the C++ version with the Rust version. Most of the implementation is done, but I can’t think of all possible corner cases myself. So I ask the interested community members to give the test instance of PetScan V2 a whirl. I re-used the web interface from PetSan V1, so it should look very familiar. If your query works, it should do so much more reliably than in V1. The code is on github, and so is the new issue tracker, where you can file bug reports and feature requests.
One Comment
Wonderful!
Petscan is lately often given no results, until you try several times … so hopefully that issue will be solved with a new version.
Another question … do you know of a python generator that accepts a pagepile as input? Now I have to work around that with CSV output, but that is time consuming after all.