I have previously blogged about Wikipedia-related page lists, and how they relate to many tools and activities. I also lamented my previous, failed attempts at introducing a “tool pipeline system”.
Well, I am not one to give up easily! The latest, greatest iteration in this vein is PagePile. Essentially, this new tool is managing piles (newspeak for “lists”) of pages from Wikipedia, Wikidata, Commons, and other projects form the WikiVerse.
New piles can be taken from various sources, including manual lists, WDQ, and the Gather extension. Several of my tools can also generate piles, including AutoList, CatScan, QuickIntersection, and Not-in-the-other-language. Either way, you end up with a numeric PagePile ID.
What can you do with that ID? First of all, you can look at the list (that example leads to the list of all humans on Wikidata, ~2.8M items long), and download it in various formats.
You can filter the list, creating a new list (with a new ID) by following language links, resolving redirects, merging and subsetting with other lists, etc.
This list will likely grow; it is quite easy to add PagePiles as an input and/or output to a tool. Let me know if there is a tool you would like to see connected to the PagePile ecosystem; likewise for new filters.
If you are a tool author on Labs, you might want to consider linking up to the obvious possibilities of this system. I made a brief introduction for programmers, put the code on BitBucket, and I am working on some code documentation.
Basically, the tool manages a list of sqlite files, each of which represents a pile (=list) of pages on a wiki. You can get the file name of the sqlite3 file from the API or via the PHP class described in the intro. Via that class, or using sqlite3 directly, you can read and write that file, adding and changing lists. Please let me know if you have problems or comments, and if you start using PagePile in your tools, so I can add them to my consumer and/or generator lists.