Generic framework for historical document processing
-
Updated
Jul 9, 2021 - Python
Generic framework for historical document processing
An include filter for Pandoc
FormKiQ Core is a flexible Open Source Document Management Platform that can be used as headless software or run using our web-based client interface. FormKiQ runs in your Amazon Web Services (AWS) Cloud, and can be used for document workflows, records management, and other document storage and processing needs using an extendable Document API
Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)
Semantic extraction from conference proceedings.
tokyo, a REST API, when given any type of document
A comprehensive list of annotated training datasets classified by use case.
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
A module for creating stopword lists for any language, based on a set of documents.
A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.
A document preprocessor that works in conjunction with tools like groff/troff & refer.
Data2Xml is .Net 6.0 Library to map data to xml by list of XPATH. Supports data sets from API and SQL.
Convert scans of handwritten notes to PDF.
This set of robots provides support for automatically obtaining information from invoices using docDigitizer API and keep track of the processed invoices on an Airtable repository
School/College Stationary List OCR and Parsing
FileGazer - deep file analysing and categorisation
An implementation of basic IR techniques from scratch.
Minimize the time requirement of audit report analysis with a containerized file conversion and scraping system
Apply keyword procedures in a given Racket namespace using X-expressions.
Add a description, image, and links to the document-processing topic page so that developers can more easily learn about it.
To associate your repository with the document-processing topic, visit your repo's landing page and select "manage topics."