Skip to content

Trust in Rust

So Toolforge is switching from grid engine to Kubernetes. This also means that tool owners such as myself need to change their tool background jobs to the new system. Mix’n’match was my tool with the most diverse job setup. But resource constraints and the requirement to “name” jobs meant that I couldn’t just port things one-to-one.

Mix’n’match has its own system of jobs that run once or on a regular basis, depend on other jobs finishing before them etc. For the grid engine, I could start a “generic” job every few minutes, that would pick up the next job and run it, with plenty of RAM assigned. Kubernetes resource restriction make this impossible. So I had to refactor/rewrite several jobs, and make them usable as PHP classes, rather than individual scripts to run.

Mix’n’match classes have become rather significant in code size, with >10K lines of code. Unsurprisingly, despite my best efforts, jobs got “stuck” for no apparent reason, bringing the whole system to a halt. This made especially new Mix’n’match catalogs rather unusable, with no automated matches etc.

Rather than fiddling with the intricacies of a hard-to-maintain codebase, I decided to replace the failing job types with new Rust code. This is already live for several job types, mainly preliminary match and person name/date match, and I am adding more. Thanks to the easy multi-threading and async/await capabilities of Rust, many jobs can run in parallel in a single process. One design feature for the new code is batched processing, so memory requirements are low (<200MB) even for multiple parallel jobs. Also, jobs now keep track of their position in the batch, and can resume if the process is stopped (eg to deploy new code).

I strongly doubt I will replace the entire code base, especially since much of the scraping code involve user-supplied PHP code that gets dynamically included. But safe, fast, and reliable Rust code serves its purpose in this complex tool.