Skip to content

Deleted gender wars

After reading the excellent analysis of AfD vs gender by Andrew Gray, where he writes about the articles that faced and survived the “Article for Deletion” process, I couldn’t help but wonder what happened to the articles that were not kept, that is, where AfD was “successful”.

So I quickly took all article titles from English Wikipedia, article namespace, that were in the logging table in the database replica as “deleted” (this may include cases where the article was deleted but later re-created under the same name), a total of 4,263,370 at the time of writing.

I filtered out some obvious non-name article titles, but which of the remaining ones were about humans? And which of those were about men, which about women? If there only were an easily accessible online database that has male and female first names… oh wait…

So I took male and female first names, as well as last (family) names, from Wikidata. For titles that end with a family name, I group the title into male, female, or unknown.

Male 289,036 50%
Female 86,281 16%
Unknown 196,562 34%
Total 571,879

Of the (male+female) articles, 23% are about women, which almost exactly matches Andrew’s ratio of women in BLP (Biographies of Living People). That would indicate no significant gender bias in actually deleted articles.

Of course, this was just a quick calculation. The analysis code and the raw data, plus instructions how to re-create it, are available here, in case you want to play further.