Skip to content

Clusterf…amilies

Wikidata and its web of interconnected items lends itself to automated clustering. I have used my Wikidata query tool to quickly (as in, a few minutes) check all clusters of humans, that is, items about humans connected by properties such as mother, child, spouse, brother, etc.

At the time of writing, there are 11,784 clusters on Wikidata, each containing two or more humans. The largest one is the supercluster “European Royalty” with 20,543 members. 7,471 clusters contain only two humans, 1,955 contain three, and the numbers drop from there.

Beyond the royalty supercluster, the largest ones include:

Sadly, there is no good genealogy rendering software that is open source and JavaScript-only; and I don’t really have the bandwidth to develop one.

I have uploaded the cluster list here; each row has a item to start with, and the size of the cluster (=number of humans). The members of the cluster can be retrieved with the Wikidata query web[start_item][22,25,40,7,9,26,45,1038]. If there is interest, I can re-calculate this cluster list again later.

The missing origin of species

Now, instead of humans and their relations, what about taxa? We recently talked about taxonomy on Wikidata at WikiCon 2014, so I thought I’d modify the script to show that taxonomy on Wikidata is in a good state. Sadly, it is not.

Using “parent taxon”, as well as the deprecated “family” and “order” properties, I get a whooping 193,040 separate clusters; and that doesn’t even count completely “unconnected” items. The good news is, the main “supercluster” consists of 1,351,245 taxa that, presumably, can all be traced back to a common root “biota”.

But, the next one is a cluster of 1,006 taxa unconnected to that root. Using a modified query, I can get the unconnected root of that cluster, Molophilus. I have uploaded the complete cluster list here; a list of items per cluster, as well as the unconnected root, can be retrieved using the respective start item, and the methods demonstrated above.

2 Comments

  1. Lockal wrote:

    Do you have any idea what goes wrong with https://tools.wmflabs.org/reasonator/geneawiki2/?q=Q7200 ? It stops with “13066 people loaded, 3 queries to go”. I guess this tree belongs to “European Royalty” supercluster. If there are cyclic structures, is it possible to detect them with WDQL?

    Tuesday, October 21, 2014 at 08:21 | Permalink
  2. Magnus wrote:

    @LOCKAL There is just no good, free code to render these huge graphs in JavaScript. It’s on my to-do-list, but waaay down 🙁

    Tuesday, October 21, 2014 at 10:18 | Permalink