A Unified Toolkit for Deep Learning Based Document Image Analysis
-
Updated
Mar 17, 2023 - Python
A Unified Toolkit for Deep Learning Based Document Image Analysis
Read and extract text and other content from PDFs in C# (port of PDFBox)
OCR engine for all the languages
Document Layout Analysis resources repos for development with PdfPig.
Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.
An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
A Large Dataset of Historical Japanese Documents with Complex Layouts
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil
A powerful CLI tool for visualization and encoding of PAGE-XML files
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
A more complete example of programming with PDFMiner, which continues where the default documentation stops
利用java-yolov8实现版面检测(Chinese layout detection),java-yolov8 is used to detect the layout of Chinese document images
A web application for PDF content and table extraction, featuring image-based visual layout analysis, indexed document search, batch processing and extraction result annotation.
An Open Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi (ICDAR 2021)
A Python + C implementation for image-based PDF page layout analysis and content extraction.
OCR-D wrapper for page-xml-draw
Add a description, image, and links to the layout-analysis topic page so that developers can more easily learn about it.
To associate your repository with the layout-analysis topic, visit your repo's landing page and select "manage topics."