Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
Oct 11, 2018
Sep 16, 2020

README.md

Extract

Circle CI

A cross-platform command line tool for parallelized, distributed content-extraction. Built on top of Apache Tika and an essential part of the engineering behind the Panama Papers, Swiss Leaks and Luxembourg Leaks investigations.

It supports Redis-backed queueing for distributed, parallel extraction and will write to Solr, plain text files or standard output.

For guidance and instructions, please see the wiki.

Credits and Collaboration

Initialy developed by Matthew Caruana Galizia at ICIJ.

We welcome contributions! Please submit pull requests or contact us directly.

License

Copyright (c) 2018 International Consortium of Investigative Journalists. See LICENSE.

About

A cross-platform command line tool for parallelised content extraction and analysis.

Topics

Resources

License

Packages

No packages published

Languages

You can’t perform that action at this time.