Skip to content

Orthogonal Recent Changes

Recent Changes is a core functionality of all wikis. It shows which articles or, in the case of Wikidata, items have changed in the last minutes, hours, days. As useful as this is, for Wikidata it is like drinking from the proverbial fire hose; if you look for a specific type of change, it is a lot of data to process.

I found myself thinking that, for some of my tools like Mix’n’match, it would be useful to monitor Wikidata Recent Changes for edits of a specific property, one that is associated with a Mix’n’match catalog. If I could see that an item had, say, a statement with a specific property added, I could check the associated catalog and set that match there as well, to keep both systems in sync. Similarly, if a statement had a statement removed, the item should be unlinked in the catalog as well.

A significant part of the community is involved in bringing more languages to Wikidata. For them, it would be good to monitor label, alias, and description changes in a specific language. Also, changes in sitelinks (eg adding a Wikipedia page in a specific language to a Wikidata item) are relevant here.

This view on Recent Changes is orthogonal to the standard one; it is primarily concerned with the type of change, rather than with the order of edits. Of course, the final result would be similar in nature, since it is still Recent Changes.

So without further ado, I present Wikidata Recent Changes (my apologies for the boring name). This is a simple front-end to the actual API. The data is updated in almost real time (<10 sec behind live Wikidata). You can query for either statement changes based on properties, or label/alias/description/sitelinks. The default format is JSONL, where each row is one independent result in JSON format. That makes is both easier to generate the data (no intermediate storage required), and easier to read (line-by-line, no need to download and parse a giant JSON object). Traditional JSON and simple HTML are also available.

You can also specify any combination of added, changed, and removed, when it comes to event types; by default, all are returned.

To make processing faster, several subsequent edits may be grouped into one “event”; the revision number and timestamp returned just says “this was the case as of this revision”, not necessarily the exact revision of the change. It tells you that an item has changed in a way you requested, but it is up to you to make sense of that change, and to check the current status of the item. To save database storage, I also do not include the actual values of the labels/statements/etc.; again you have to figure this out yourself.

I started the data collection yesterday, and I see ~1M rows/day added. If this gets too large, I will prune older data, say, 1 month? But until then, please give this a whirl!