Wikidata and its web of interconnected items lends itself to automated clustering. I have used my Wikidata query tool to quickly (as in, a few minutes) check all clusters of humans, that is, items about humans connected by properties such as mother, child, spouse, brother, etc.
At the time of writing, there are 11,784 clusters on Wikidata, each containing two or more humans. The largest one is the supercluster “European Royalty” with 20,543 members. 7,471 clusters contain only two humans, 1,955 contain three, and the numbers drop from there.
Beyond the royalty supercluster, the largest ones include:
- house of Gemmingen (216 humans)
- some Greeks (102 humans)
- assorted Indian emperors (98 humans)
- kings of Saudi-Arabia (96 humans)
I have uploaded the cluster list here; each row has a item to start with, and the size of the cluster (=number of humans). The members of the cluster can be retrieved with the Wikidata query web[start_item][22,25,40,7,9,26,45,1038]. If there is interest, I can re-calculate this cluster list again later.
The missing origin of species
Now, instead of humans and their relations, what about taxa? We recently talked about taxonomy on Wikidata at WikiCon 2014, so I thought I’d modify the script to show that taxonomy on Wikidata is in a good state. Sadly, it is not.
Using “parent taxon”, as well as the deprecated “family” and “order” properties, I get a whooping 193,040 separate clusters; and that doesn’t even count completely “unconnected” items. The good news is, the main “supercluster” consists of 1,351,245 taxa that, presumably, can all be traced back to a common root “biota”.
But, the next one is a cluster of 1,006 taxa unconnected to that root. Using a modified query, I can get the unconnected root of that cluster, Molophilus. I have uploaded the complete cluster list here; a list of items per cluster, as well as the unconnected root, can be retrieved using the respective start item, and the methods demonstrated above.