Skip to content

Qualified answers

WikiData Query (WDQ), one of my pet projects I have blogged about before, has gained a new functionality: search by qualifier. That is, a query for items with specific statements can now be fine-tuned by querying the qualifiers as well.

Too abstract? Let’s see: Who won the Royal Medal in 1853? Turns out it was Charles Darwin. Or how about this: Who received any award, medal, etc. in 1835? According to Wikidata, it’s just one other person besides Darwin; Patrice de Mac-Mahon was awarded a rank in the French Legion of Honour.

This might seem like a minor gimmick, but I think it’s huge. It is, to the best of my knowledge, the only way to query WikiData on qualifiers; and with qualifiers becoming more important (replacing groups of properties in lieu of simpler, “qualified” ones, for example in taxonomy) and more numerous, such queries will become more important to the “ecosystem” around Wikidata.

With the main point delivered, here are some points for those interested in using this system, or the development of WDQ:

  • Currently, you can only use qualifier queries in combination with the CLAIM command, though I’ll add it to string, time, location, and quantity commands next.
  • A subquery in {curly¬†brackets} directly after the CLAIM command is used on the qualifiers of the statements that match the “main” command
  • Within the qualifier subquery, you can use all other commands, ranges, trees etc.
  • Adding qualifiers to the in-memory database required some major retrofitting of WDQ. It started my using “proper” statement representations (C++ template classes, anyone?), followed by creating a qualifier data type (to re-use all the existing commands within the subquery, each qualifier set is basically its own Wikidata!), extending the dump parsing, binary storage and loading, and a zillion other small changes.
  • While I believe the code quality has improved significantly, some optimization remains to be done. For example, memory requirements have skyrocketed from <1GB to almost 4GB. The current machine has 16GB RAM, so that should hold for a while, but I believe I can do better.
  • There is an olde memory leak, which appears to have become worse by the increased RAM usage. WDQ will restart automatically if the process dies, but queries will be delayed for a minute or so every time that happens.
  • I have prepared the code to use ranks, but neither parsing nor querying are implemented yet.
  • I need to update the API documentation to reflect the qualifier queries. Also, I need to write a better query builder. In due time.