Skip to content

Category Archives: Wikidata

Merge and diff

Originally, I wanted to blog about adding new properties (taxon data speficially, NCBI, GBIF, and iINaturalist) to my AC2WD tool (originally described here). If you have the user script installed on Wikidata, AC2WD will automatically show up on relevant taxon items. But then I realized that the underlying tech might be useful to others, if […]

A quick comparison

Over the years, Mix’n’match has helped to connect many (millions?) of third-party entries to Wikidata. Some entries can be identified and matched in a fully automated fashion (eg people with birth and death dates), but the majority of entries require human oversight. For some entries that works nicely, but others are hard to disambiguate from […]

Mix’n’match background sync

My Mix’n’match tool helps matching third-party catalogs to Wikidata items. Now, things happen on Mix’n’match and Wikidata in parallel, amongst them: Wikidata items are deleted Wikidata items are merged, leving one to redirect to the other External IDs are added to Wikidata This leads to the states of Mix’n’match and Wikidata diverging over time. I […]

Cram as cram can

So I am trying to learn (modern) Greek, for reasons. I have books, and online classes, and the usual apps. But what I was missing was a simple way to rehearse common words. My thoughts went to Wikidata items, and then to lexemes. Lexemes are something I have not worked with a lot, so this […]

Musings on the backend

So the Wikidata query service (WDQS), currently powered by Blazegraph, does not perform well. Even simple queries time out, it lags behind the live site, and individual instances can (and do) silently go out of sync. The WMF is searching for a replacement. One of the proposed alternatives to blazegraph is Virtuoso. It models the […]

Lists. The plague of managing things. But also surprisingly useful for many tasks, including Wikimedia-related issues. Mix’n’match is a list of third-party entries. PetScan generates lists from Wikipedia and Wikidata. And Listeria generates lists on-wiki. But there is a need for generic, Wikimedia-related, user-curated lists. In the past, I have tried to quell that demand […]

Turn the AC on

A large part of Wikidata is the collection of external identifiers for items. For some item types, such as items about people (Q5), some of this is what is known as Authority Control (AC) data, for example, VIAF (P214). One thing that distinguishes AC data from other external IDs is that AC data sources are […]

Trust in Rust

So Toolforge is switching from grid engine to Kubernetes. This also means that tool owners such as myself need to change their tool background jobs to the new system. Mix’n’match was my tool with the most diverse job setup. But resource constraints and the requirement to “name” jobs meant that I couldn’t just port things […]

Vue le vue

A while ago, Wikimedia sites, including Wikidata, started to use the Vue.js framework to ease the future development of user interface components. Vue is, to some degree, also available for user scripts. I have a few user scripts on Wikidata, and some of them have a seriously outdated interface. There was a modal (page-blocking) dialog […]

Join the Quest

I recently came across an interesting, semi-automated approach to create statements in Wikidata. A SPARQL query would directly generate commands for QuickStatements. These would then be manually checked before running them. The queries exist as links on Wikidata user pages, and have to be run by clicking on them. That seemed useful but tedious. I […]