Skip to content

Get into the Flow

Unix philosophy contains the notion that each program should perform an single function (and perform that function exceptionally well), and then be used together with other single-function programs to form a powerful “toolbox”, with tools connected via the geek-famous pipe (“|”).

A ToolFlow workflow, filtering and re-joining data rows

The Wikimedia ecosystem has lots of tools that perform a specific function, but there is little in terms of interconnection between them. Some tool prorammers have added other tools as (optional) inputs (eg PetScan) or outputs (eg PagePile), but that is the extend of it. There is PAWS, a Wikimedia-centric Jupyter notebook, but it does require a certain amount of coding, which excludes many volunteers.

So I finally got around to implementing the “missing pipe system” for Wikimedia tools, which I call ToolFlow. I also started a help page explaining some concepts. The basic idea is that a user can create a new workflow (or fork an existing one). A workflow would usually start with one or more adapters that represent the output of some tool (eg PetScan, SPARQL query, Quarry) for a specific query. The adapter queries the tool, and represents the tool output in a standardized internal format (JSONL) that is written into a file on the server. These files can then be filtered and combined to form sub- and supersets. Intermediate files will be cleaned up automatically, but the output of steps (ie nodes) that have no further steps after them is kept, and can only be cleared by re-running the entire workflow run. Output can also be exported via a generator; at the moment, the only generator is a Wikipage editor, which will create a Wikitext table on a Wiki page of your choice from an output file.

Only the owner (=creator) of a workflow can edit or run it, but the owner can also set a scheduler (think UNIX cronjob) for the workflow to be run regularly every day, week, or month. ToolFlow remembers your OAuth details, so it can edit wiki pages regularly, updating a wikitext table with the current results of your workflow.

I have created some demo workflows:

Now, this is a very complex software, spanning two code repoitories (one for the HTML/JS/PHP front-end, and one for the Rust back-end). Many things can go wrong here, and many tools, filters etc can be added. Please use the issue trackers for problems and suggestions. I would especially like some more suggestions for tools to use as input. And despite my best efforts, the interface is still somewhat complicated to understand, so please feel free to improve the help page.

2 Comments

  1. Magnus Tools wrote:

    A revival of your 2012 pipeline editor?
    Your 2014 Toolscript was a much more powerful and flexible iteration, though obviously only for tech savvy people.

    Saturday, September 30, 2023 at 09:51 | Permalink
  2. Magnus wrote:

    As I wrote in the post, there already is PAWS for the tech savvy people. This is more of a point-and-click solution for people who would otherwise run a workflow manually over and over.

    Monday, October 2, 2023 at 08:37 | Permalink