Software developer, researcher and consultant with a PhD in Computer Science, Web Data Engineer at Internet Archive, working on better access to web archives.
-
Internet Archive
- Hannover, Germany
- http://www.HelgeHolzmann.de
Block or Report
Block or report helgeho
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePopular repositories
-
ArchiveSpark Public
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
-
Scripts to transfer archive.org collections, using https://github.com/jjjake/internetarchive
-
HadoopConcatGz Public
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
-
HadoopWebGraph Public
A Hadoop input format to use gaphs in WebGraph's BV format with Hadoop and Spark.


