Here are
91 public repositories
matching this topic...
Module for automatic summarization of text documents and HTML pages.
Updated
Sep 2, 2020
Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Updated
Jul 6, 2020
Python
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約
Heuristic based boilerplate removal tool
Updated
Jul 1, 2020
Python
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
Text Extraction, Rendering and Converting of PDF Documents
Better analyze information, in all its forms
A simple library for parsing, modifying, and composing SRT files.
Updated
Aug 31, 2020
Python
AWS Lambda functions to extract text from various binary formats.
Updated
Feb 7, 2018
Python
Simple app to extract text from pictures using Tesseract
Updated
Dec 28, 2019
HTML
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Web scraping library and command-line tool to download, extract (metadata, main text, comments), and convert the output
Updated
Oct 2, 2020
Python
📖 Labeled examples from wiki dumps in Python
Updated
Aug 8, 2016
Jupyter Notebook
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extrator)
Updated
Sep 26, 2020
Python
PDF Reader Library for Native Julia.
Updated
Jul 29, 2020
Julia
Extract text from plaintext, .docx, .odt, .pdf and .rtf files. Pure go.
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Updated
Sep 15, 2020
HTML
Revive awka - Awk to C Compiler
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
Updated
Aug 24, 2020
Jupyter Notebook
Text extraction for Wagtail document search
Updated
Oct 1, 2020
Python
Python port of Boilerpipe library
Updated
Dec 22, 2019
Python
A PDF collection reader with built-in full-text search engine
Updated
Jun 3, 2017
JavaScript
tokyo, a REST API, when given any type of document 📄 , Identifies mime-type 🧐 . Suggests extension 🦔 . Alas Extracts text 💪 .
Updated
Jun 13, 2020
Clojure
Bachelor Thesis | Text extraction from complex video scenes
Updated
Mar 15, 2019
Java
Heuristic text extraction from news sites in Python3
Updated
Dec 31, 2017
Python
A simple component to extract just the text from any file that has an IFilter installed. Available as a C++ COM component and as a C# .NET library.
Improve this page
Add a description, image, and links to the
text-extraction
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
text-extraction
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
Is your feature request related to a problem? Please describe.
The problem is inefficiency when simply looking for a single operand and then stopping processing.
For example, if only looking for a single colored pixel in a page.
Describe the solution you'd like
It would make sense to be able to set a stop flag on the processor and return out of the handler, which would cause the proc