Skip to content

The Hand-editor’s Tale

Disclaimer: I am the author of Listeria, and maintainer of ListeriaBot.

In January 2016, User:Emijrp had an idea. Why not use that newfangled Listeria tool, a bot that generates lists based on Wikidata, and puts them on Wikipedia pages, to maintain a List of Women Linguists on English Wikipedia? It seemed that a noble cause had met with (at the time) cutting edge technology to provide useful information for both readers and editors (think: red links) on Wikipedia.

The bot thus began its work, and continued dutifully for almost a year (until January 2, 2017). At that time, a community decision was made to deactivate ListeriaBot edits on the page, but to keep the list it had last generated, for manual curation. No matter what motivated that decision, it is interesting to evaluate the progress of the page in its “manual mode”.

Since the bot was deactivated, 712 days have passed (at the time of writing, 2018-12-17). Edit frequency dropped from one edit every 1-2 days by the bot, to one edit every 18 days on average.

In that time, the number of entries increased from 663 (last bot edit) to 673 (adding one entry every 71 days on average). The query (women, linguists, but no translators) used to generate the Listeria list now yields 1,673 entries. This means the list on English Wikipedia is now exactly 1,000 entries (or 148%) out of date. A similar lag for images and birth/death dates is to be expected.

The manual editors kept the “Misc” section, which was used by ListeriaBot to group entries of unknown or “one-off” (not warranting their own section) nationalities. It appears that few, if any, have been moved into appropriate sections.

It is unknown if manual edits to the list (example), the protection of which was given as a main reason to deactivate the bot, were propagated to the Wikidata item, or to the articles on Wikipedia where such exist (here, es and ca), or if they are destined to wither on the list page.

The list on Wikipedia links to 462 biography pages (likely a slight overestimate) on English Wikipedia. However, there are 555 Wikidata items from the original query that have a sitelink to English Wikipedia. The list on Wikipedia thus fails to link to (at least) 93 women linguists that exist on the same site. One example of a missing entry would be Antonella Sorace, a Fellow of the Royal Society. She is, of course, linked from another, ListeriaBot-maintained page.

Humans are vastly superior to machines in many respects. Curating lists is not necessarily one of them.

6 Comments

  1. Magnus wrote:

    Addendum: Wikidata has 34 items about Women Linguists from Estonia. English Wikipedia has none of them. Not a single one.

    Monday, December 17, 2018 at 17:02 | Permalink
  2. Egon wrote:

    Thanks for the write up. It pushed me to finally try it out, and here is the first result (the list is supposed to stay empty 🙂 https://www.wikidata.org/wiki/User:Egon_Willighagen/MisclassifiedEurJOCArticles

    Tuesday, December 18, 2018 at 08:47 | Permalink
  3. Rexx wrote:

    The fly in the ointment at present, Magnus, is verifiability. What is to stop someone from adding occupation=linguist to Q3850003, Martha Kent? The result would be that the next bot update would include Superman’s adoptive mother in the list of women linguists. I am by no means convinced that such vandalism would be detected or corrected in any short time-frame. Take your pick of hundreds of thousands of reasonably obscure woman on Wikidata who could provide a similarly ripe target.

    The corollary to that is when we ask the question “how certain are we that those 1,673 women on the latest bot list actually are linguists?”, I suspect that the answer has to be “I don’t know”. When somebody challenges a locally added statement on Wikipedia, either it is verified or eventually removed. The same is not true for a statement added by bot when derived from Wikidata because if it is removed, the bot will re-add it at the next run. Although the statement on Wikidata can be removed, it is reasonably likely that it will simply be replaced by whoever put it there previously.

    Until the majority of statements on Wikidata are reliably sourced (that excludes “imported from Xyz Wikipedia”), or filters are used by default to omit unreferenced statements, we will continue to fight an uphill battle to gain acceptance for Wikidata in the English Wikipedia.

    Thursday, December 20, 2018 at 19:38 | Permalink
  4. Peter Southwood wrote:

    Bots are good at compiling lists, people are currently better at verifying data. While I completely agree with REXX’s comment above, There may be a middle way. If the bot were to update a list not in mainspace, but on a talk page (or talk page sub-page) then most of the finding work could be done by the bot and stored in an easily accessible place out of sight of the casual user so that anyone who wishes to update the article can find suggestions in a convenient format.
    To improve efficiency, the bot should check the article and not repeat duplicates, but flag any changes on the hidden list. To prevent relisting of unverifiable items, the editor could flag items on the hidden list as unverifiable, so they would only be updated in the event of a change on Wikidata.

    Wednesday, January 2, 2019 at 09:35 | Permalink
  5. Magnus wrote:

    @REXX (sorry, didn’t see the comment until now): As you can see on https://tools.wmflabs.org/wikidata-todo/stats.php >2/3 of Wikidata statements are referenced, and to something other than Wikipedia.

    That aside, what is to stop someone from adding that to a Wikipedia article? Your argument is the same FUD Britannica used to spin about Wikipedia, when it was new. The answer, of course, is “community” and their watchful eyes. It would help if people saw “my Wikipedia” as “the” project, but rather realize that Wikipedia, Wikidata, Commons etc. are one big project, and apply themselves accordingly.

    @Peter: “out of sight of the common user” should be used to judge a list in general, no matter if it was created by Wikimedians working on Wikipedia or Wikimedians on Wikidata. Besides, all edits by the bot show up in the Wikipedia’s Recent Changes, so the Wikipedia community can check it for vandalism just as if it were a manually curated list. Reverting vandalism may require one more click (going to the Wikidata item), but given the shortcomings I listed above, that seems like a small price to pay.

    Wednesday, January 2, 2019 at 10:15 | Permalink
  6. GerardM wrote:

    When Martha Kent is made a linguist, something that may be verifiably true in the Marvel universe, it is no vandalism. A proper query to be used in a Wikipedia would limit to humans. Mrs Martha Kent is not.

    Wednesday, January 2, 2019 at 10:40 | Permalink

One Trackback/Pingback

  1. Website on Saturday, December 29, 2018 at 23:49

    website

    The Whelming