One webpage for every book ever published!
Python 3.6k 847
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Java 2.2k 711
The Internet Archive BookReader
JavaScript 722 355
A Modal Manager WebComponent
Import workflows for the Wikipedia Citations Database
Trough: Big data, small databases.
Monorepo for Archive.org UX development and prototyping.
brozzler - distributed browser-based web crawler