I have previously blogged about Wikipedia-related page lists, and how they relate to many tools and activities. I also lamented my previous, failed attempts at introducing a “tool pipeline system”.
Well, I am not one to give up easily! The latest, greatest iteration in this vein is PagePile. Essentially, this new tool is managing piles (newspeak for “lists”) of pages from Wikipedia, Wikidata, Commons, and other projects form the WikiVerse.
Manipulations
New piles can be taken from various sources, including manual lists, WDQ, and the Gather extension. Several of my tools can also generate piles, including AutoList, CatScan, QuickIntersection, and Not-in-the-other-language. Either way, you end up with a numeric PagePile ID.
What can you do with that ID? First of all, you can look at the list (that example leads to the list of all humans on Wikidata, ~2.8M items long), and download it in various formats.
You can filter the list, creating a new list (with a new ID) by following language links, resolving redirects, merging and subsetting with other lists, etc.
Finally, you can import them into several of my tools, including Autolist, FIST, WD-FIST,Not-in-the-other-language, and GetItemNames.
This list will likely grow; it is quite easy to add PagePiles as an input and/or output to a tool. Let me know if there is a tool you would like to see connected to the PagePile ecosystem; likewise for new filters.
Tech
If you are a tool author on Labs, you might want to consider linking up to the obvious possibilities of this system. I made a brief introduction for programmers, put the code on BitBucket, and I am working on some code documentation.
Basically, the tool manages a list of sqlite files, each of which represents a pile (=list) of pages on a wiki. You can get the file name of the sqlite3 file from the API or via the PHP class described in the intro. Via that class, or using sqlite3 directly, you can read and write that file, adding and changing lists. Please let me know if you have problems or comments, and if you start using PagePile in your tools, so I can add them to my consumer and/or generator lists.
14 Comments
Awesome tool! I haved used your linked items tool the last few days as a feeder for WD FIST. I use the feature to get a page on Wikipedia (with a lot of links to other articles on them that are interesting to me). If you add it as a generator you save me some copying and pasting. https://tools.wmflabs.org/wikidata-todo/linked_items.php
@Jan Done. You can also use the “manual list” option here:
https://tools.wmflabs.org/pagepile/?menu=new
I have added one-click options to resolve redirects and switch to Wikidata, so you can just past your list without having to turn it into links.
A Minus/Difference filter would be neat. That would make it possible to e.g. finding items with a certain claim but no (specific) qualifier for that claim (which requires two separate wdq queries).
So, remove pages in list 1 that are in list 2? Can do that…
@Jan Done. Called it “exclusive”.
@Magnus. Perfect. Many thanks!
I’d especially like to input a wikidata-terminator pagepile into not-in-the-other-language. At least for the wikis I know, both tools produce lists with a lot of “noise”, that would be easy to filter by combining them.
@nemo Done.
I just tried to get some stats on Swedish art, but the category tree on Swedish Wikipedia is badly maintained. Could treeviews accept a pagepile as input? http://tools.wmflabs.org/glamtools/treeviews/
@Jan: Done. Try 252 as an example. You can pre-fill the form from here:
https://tools.wmflabs.org/pagepile/api.php?id=252&action=get_data&format=html&doit1
thankyo forest
How would one create a pile using the WDQ API? As in, sending an HTTP request to wdq.wmflabs.org/api?q={stuff}, what should go in the second set of brackets go get a PagePile ID?
@Daniel: Just go to
https://tools.wmflabs.org/pagepile/?menu=new
and fill in the WDQ query there.
To do this “automatically”, request
https://tools.wmflabs.org/pagepile/?doit=1&pagepile_format=json&wdq=claim%5B31:5%5D and noclaim[21]
and you will get JSON with the PagePile ID back.
Thanks!