Here are
33 public repositories
matching this topic...
An Awesome List for getting started with web archiving
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
-
Updated
May 19, 2020
-
JavaScript
A list of things related to software, literature, and other content for 🕣 Memento
Parse And Create Web ARChive (WARC) files with node.js
-
Updated
Jul 16, 2020
-
JavaScript
A dockerized, queued high fidelity web archiver based on Squidwarc
-
Updated
Jul 19, 2020
-
Python
A social media open post web archiving tool
-
Updated
Jun 9, 2020
-
JavaScript
Seeder - Czech webarchive curating tool and public site
-
Updated
Jun 19, 2020
-
Python
Decentralized web archiving
-
Updated
Aug 7, 2018
-
Python
pywb recorder over tor, anonymously records the web. (docker image)
Digital Preservation of HTTP in documentary heritage.
Modern wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication
record current active tab on webrecorder.io
-
Updated
May 9, 2017
-
JavaScript
Tika based link extractor for httpreserve
-
Updated
Jan 25, 2020
-
HTML
A helper package to tokenize textual content and retrieve hyperlinks
Client app for httpreserve pkg that generates CSV, JSON, HTTP, and BoltDB
-
Updated
Mar 23, 2019
-
JavaScript
Given four bytes, download a random file from web archives implementing the UKWA Shine interface
An archival thumbnail visualization server
-
Updated
Jun 9, 2020
-
JavaScript
metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
-
Updated
May 11, 2020
-
Python
A wrapper for phantom.js commands for headless screenshots.
-
Updated
Sep 20, 2017
-
JavaScript
Link crawler for a phpBB forum
-
Updated
Jul 17, 2017
-
Java
Class page for ODU CS 791 / 891 Web Archiving Seminar
-
Updated
Jul 20, 2017
-
JavaScript
A restrictied API in Golang for the (semi)-exposed functions of the internet archive.
HTTPreserve Analysis of Million Dollar Web Page
An Awesome List for getting started with web archiving
Extracts links from DSpace repositories
-
Updated
Oct 29, 2019
-
Java
Nástroj pro archivaci webových stránek na Wayback Machine
-
Updated
Dec 30, 2018
-
Kotlin
Offline storage of website data on Android
-
Updated
Jun 22, 2018
-
Kotlin
This repository contains work done to determine how much of
www.guideline.gov and qualitymeasures.ahrq.gov were archived.
-
Updated
Jul 16, 2018
-
Jupyter Notebook
Improve this page
Add a description, image, and links to the
webarchiving
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
webarchiving
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.