Skip to content

Add it to the pile!

I have previously blogged about Wikipedia-related page lists, and how they relate to many tools and activities. I also lamented my previous, failed attempts at introducing a “tool pipeline system”.

Well, I am not one to give up easily! The latest, greatest iteration in this vein is PagePile. Essentially, this new tool is managing piles (newspeak for “lists”) of pages from Wikipedia, Wikidata, Commons, and other projects form the WikiVerse.

Manipulations

Filtering a list.

Filtering a list.

New piles can be taken from various sources, including manual lists, WDQ, and the Gather extension. Several of my tools can also generate piles, including AutoList, CatScan, QuickIntersection, and Not-in-the-other-language. Either way, you end up with a numeric PagePile ID.

What can you do with that ID? First of all, you can look at the list (that example leads to the list of all humans on Wikidata, ~2.8M items long), and download it in various formats.

You can filter the list, creating a new list (with a new ID) by following language links, resolving redirects, merging and subsetting with other lists, etc.

Finally, you can import them into several of my tools, including Autolist, FIST, WD-FIST,Not-in-the-other-language, and GetItemNames.

This list will likely grow; it is quite easy to add PagePiles as an input and/or output to a tool. Let me know if there is a tool you would like to see connected to the PagePile ecosystem; likewise for new filters.

Tech

If you are a tool author on Labs, you might want to consider linking up to the obvious possibilities of this system. I made a brief introduction for programmers, put the code on BitBucket, and I am working on some code documentation.

Basically, the tool manages a list of sqlite files, each of which represents a pile (=list) of pages on a wiki. You can get the file name of the sqlite3 file from the API or via the PHP class described in the intro. Via that class, or using sqlite3 directly, you can read and write that file, adding and changing lists. Please let me know if you have problems or comments, and if you start using PagePile in your tools, so I can add them to my consumer and/or generator lists.

14 Comments

  1. Jan Ainali wrote:

    Awesome tool! I haved used your linked items tool the last few days as a feeder for WD FIST. I use the feature to get a page on Wikipedia (with a lot of links to other articles on them that are interesting to me). If you add it as a generator you save me some copying and pasting. https://tools.wmflabs.org/wikidata-todo/linked_items.php

    Thursday, July 30, 2015 at 07:50 | Permalink
  2. Magnus wrote:

    @Jan Done. You can also use the “manual list” option here:
    https://tools.wmflabs.org/pagepile/?menu=new

    I have added one-click options to resolve redirects and switch to Wikidata, so you can just past your list without having to turn it into links.

    Thursday, July 30, 2015 at 10:37 | Permalink
  3. André wrote:

    A Minus/Difference filter would be neat. That would make it possible to e.g. finding items with a certain claim but no (specific) qualifier for that claim (which requires two separate wdq queries).

    Thursday, July 30, 2015 at 11:32 | Permalink
  4. Magnus wrote:

    So, remove pages in list 1 that are in list 2? Can do that…

    Thursday, July 30, 2015 at 11:49 | Permalink
  5. Magnus wrote:

    @Jan Done. Called it “exclusive”.

    Thursday, July 30, 2015 at 12:36 | Permalink
  6. André wrote:

    @Magnus. Perfect. Many thanks!

    Thursday, July 30, 2015 at 12:44 | Permalink
  7. Nemo wrote:

    I’d especially like to input a wikidata-terminator pagepile into not-in-the-other-language. At least for the wikis I know, both tools produce lists with a lot of “noise”, that would be easy to filter by combining them.

    Saturday, August 1, 2015 at 19:08 | Permalink
  8. Magnus wrote:

    @nemo Done.

    Monday, August 3, 2015 at 10:13 | Permalink
  9. Jan Ainali wrote:

    I just tried to get some stats on Swedish art, but the category tree on Swedish Wikipedia is badly maintained. Could treeviews accept a pagepile as input? http://tools.wmflabs.org/glamtools/treeviews/

    Tuesday, August 4, 2015 at 08:43 | Permalink
  10. Magnus wrote:

    @Jan: Done. Try 252 as an example. You can pre-fill the form from here:
    https://tools.wmflabs.org/pagepile/api.php?id=252&action=get_data&format=html&doit1

    Tuesday, August 4, 2015 at 11:05 | Permalink
  11. devafine wrote:

    thankyo forest

    Sunday, August 9, 2015 at 18:53 | Permalink
  12. Daniel wrote:

    How would one create a pile using the WDQ API? As in, sending an HTTP request to wdq.wmflabs.org/api?q={stuff}, what should go in the second set of brackets go get a PagePile ID?

    Tuesday, August 18, 2015 at 03:26 | Permalink
  13. Magnus wrote:

    @Daniel: Just go to
    https://tools.wmflabs.org/pagepile/?menu=new
    and fill in the WDQ query there.

    To do this “automatically”, request
    https://tools.wmflabs.org/pagepile/?doit=1&pagepile_format=json&wdq=claim%5B31:5%5D and noclaim[21]
    and you will get JSON with the PagePile ID back.

    Tuesday, August 18, 2015 at 10:25 | Permalink
  14. Daniel wrote:

    Thanks!

    Tuesday, August 18, 2015 at 13:00 | Permalink