Skip to content

Batches of Rust

QuickStatments is a workhorse for Wikidata, but it had a few problems of late.

One of those is bad performance with batches. Users can submit a batch of commands to the tool, and these commands are then run on the Labs server. This mechanism has been bogged down for several reasons:

  • Batch processing written in PHP
  • Each batch running in a separate process
  • Limitation of 10 database connection per tool (web interface, batch processes, testing etc. together) on Labs
  • Limitation of (16? observed but not validated) simultaneous processes per tool on Labs cloud
  • No good way to auto-start a batch process when it is submitted (currently, auto-starting a PHP process every 5 minutes, and exit if there is nothing to do)
  • Large backlog developing

Amongst continued bombardment on Wiki talk pages, Twitter, Telegram etc. that “my batch is not running (fast enough)”, I went to mitigate the issue. My approach is to do all the batches in a new processing engine, written in Rust. This has several advantages:

  • Faster and easier on the resources than PHP
  • A single process running on Labs cloud
  • Each batch is a thread within that process
  • Checking for a batch to start every second (if you submit a new batch, it should start almost immediately)
  • Use of a database connection pool (the individual thread might have to wait a few milliseconds to get a connection, but the system never runs out)
  • Limiting simultaneous batch processing for batches from the same user (currently: 2 batches max) to avoid the MediaWiki API “you-edit-too-fast” error
  • Automatic handling of maxlag, bot/OAuth login etc. by using my mediawiki crate

This is now running on Labs, processing all (~40 at the moment) open batches simultaneously. Grafana shows the spikes in edits, but no increased lag so far. The process is given 4GB of RAM, but could probably do with a lot less (for comparison, each individual PHP process used 2GB).

A few caveats:

  • This is a “first attempt”. It might break in new, fun, unpredicted ways
  • It will currently not process batches that deal with Lexemes. This is mostly a limitation of the wikibase crate I use, and will likely get solved soon. In the meantime, please run Lexeme batches only within the browser!
  • I am aware that I have now code duplication (the PHP and the Rust processing). For me, the solution will be to implement QuickStatements command parsing in Rust as well, and replace PHP completely. I am aware that this will impact third-party use of QuickStatements (e.g. the WikiBase docker container), but the PHP and Rust sources are independent, so there will be no breakage; of course, the Rust code will likely evolve away from PHP in the long run, possibly causing incompatabilities

So far, it seems to be running fine. Please let me know if you encounter any issues (unusual errors in your batch, weird edits etc.)!

One Comment

  1. Iván wrote:

    This really looks awesome! Since some months ago I have read about Rust and its potential. It has a big learning curve, but it seems that when you reach the necessary knowledge, you can build awesome things!

    However, probably due to my ignorance about it, I have a question: why Rust and not Python? Python seems easier to code and has good features for this kind of tasks, at least in my opinion.

    What makes you to decide for Rust and not for Python?

    Regards,
    Iván

    Wednesday, June 5, 2019 at 15:46 | Permalink