#

pdf-document-processor

Here are 119 public repositories matching this topic...

wmjordan / PDFPatcher

PDF补丁丁——PDF工具箱，可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档，探查文档结构，提取图片、转成图片等等

pdf pdf-converter pdf-generation pdf-document-processor

Updated Apr 20, 2022
C#

unidoc / unipdf

Open

[FEATURE] Early-termination while processing contentstream

gunnsth commented Oct 21, 2019

Is your feature request related to a problem? Please describe.
The problem is inefficiency when simply looking for a single operand and then stopping processing.
For example, if only looking for a single colored pixel in a page.

Describe the solution you'd like
It would make sense to be able to set a stop flag on the processor and return out of the handler, which would cause the proc

Read more

good first issue performance feature

pdf2htmlEX / pdf2htmlEX

Convert PDF to HTML without losing text or format.

html pdf pdf-viewer pdf-document-processor

Updated Dec 15, 2021
HTML

UglyToad / PdfPig

Open

Image with FlateDecode filter and 1 bit per component issue

bunchofcoders commented Dec 28, 2021

Looks like the function below returns bytes with value 1 instead of 255 which produces near black png. for all other type of filters it works fine.

Filter: FlateDecode
ColorSpace: DeviceGray
BitsPerComponent: 1

public static byte[] Convert(ColorSpaceDetails details, IReadOnlyList decoded, int bitsPerComponent, int imageWidth, int imageHeight);

Read more

bug good first issue

Open

measurement properties

1

hellerbarde / stapler

A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk

python pdf pdf-converter pdf-document-processor

Updated Mar 30, 2022
Python

GowenGit / docnet

DocNET is as fast PDF editing and reading library for modern .NET applications

pdf csharp jpeg pdf-converter netcore netstandard pdf-files pdf-document pdf-conversion pdf-extractor pdf-document-processor

Updated Apr 14, 2022
C#

abarker / pdfCropMargins

pdfCropMargins -- a program to crop the margins of PDF files

python pdf pdf-converter crop cropper pdf-document-processor

Updated Mar 9, 2021
Python

houking-can / CCKS2019-Task5

CCKS2019评测任务五-公众公司公告信息抽取，第3名

flask web-api event-extraction ner table-extraction pdf2html pdf-document-processor ccks

Updated Sep 15, 2019
Python

IBM / science-result-extractor

nlp information-extraction ibm-research table-extraction scientific-papers pdf-document-processor ibm-research-ai

Updated May 20, 2021
Java

lovasoa / pagelabels-py

Sponsor

Python library to manipulate PDF page labels

pdf labels page pdf-document-processor

Updated Dec 22, 2021
Python

michaelrsweet / pdfio

Sponsor

PDFio is a simple C library for reading and writing PDF files.

c pdf pdf-document pdf-generation pdf-document-processor pdf-document-api

Updated Mar 2, 2022
C

pankajr141 / pdf2jpg

Utility to convert PDF into JPG files

pdf-converter pdf-document-processor

Updated Nov 10, 2021
Java

naiveHobo / pdfviewer

PDFViewer is a GUI tool, written using python3 and tkinter, which lets you view PDF documents.

pdf tkinter pdf-viewer pdf-files pdf-document tkinter-graphic-interface tkinter-gui pdf-document-processor tkinter-python tkinter-library

Updated Jul 4, 2021
Python

Dtronix / PDFiumCore

.NET Standard P/Invoke bindings for PDFium.

pdf csharp dotnet pdf-document pdf-generation pinvoke-wrapper pdfium pdf-document-processor

Updated Apr 13, 2022
C#

backup-utils

taseikyo / backup-utils

✨ A batch of useful code/scripts: run commands automatically, finish repetitive stupid operations, perform format conversions, etc.

bash backups python3 zhihu bilibili scripts-collection srt-subtitles pdf-document-processor backup-utils

Updated Mar 22, 2021
Python

sfneal / pdfconduit

Prepare documents for distribution

python pdf encryption pdfkit pdf-generation pypdf2 watermark pdfrw pdf-document-processor

Updated Jun 17, 2021
Python

ViliusSutkus89 / pdf2htmlEX-Android

Sponsor

Android port of pdf2htmlEX - Convert PDF to HTML without losing text or format.

android html pdf library pdf-document-processor pdf-conversion-library

Updated Apr 20, 2022
Java

IBM / generate-insights-from-data-formats-with-watson

How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.

nlp data-science text-mining watson natural-language jupyter-notebook artificial-intelligence cloud-computing recommender-system self-learning ibm-cloud watson-nlu watson-natural-language unstructured-data pdf-document-processor watson-studio

Updated May 27, 2020
Jupyter Notebook

uroesch / pdftools

A collection of PDF command line tools and wrappers for Linux

pdf pdf-converter pdf-generation pdf-document-processor

Updated Jan 26, 2022
Shell

BobLd / PdfPigMLNetBlockClassifier

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

classifier pdf machine-learning csharp lightgbm pdf-document document-layout layout-analysis pdf-document-processor document-layout-analysis ml-net pdfpig publaynet

Updated Mar 16, 2020
C#

Academic-Hammer / PDFConverter

Converting pdf to any format for easily analyzing

pdf2xml table-extraction pdf2html pdf-document-processor pdf2txt pdf2xls pdf2word

Updated Aug 29, 2019
Python

akoweb / tcpdf

persian and arabic fonts for TCPDF - PHP -فونت فارسی برای tcpdf

php-library utf-8 persian arabic tcpdf php-lib pdf-document-processor tcpdf-library

Updated Apr 5, 2021

simonwongwong / PDF_Merge_and_Edit

Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google

pdf pdf-document-processor pdf-editor

Updated Feb 17, 2019
Python

eiceblue / Spire.PDF-for-Java

Spire.PDF for Java is a PDF component that enables to read, write, print and convert PDF documents in Java applications without using Adobe Acrobat.

java java-library pdf-document-processor

Updated Mar 12, 2019

umer7 / Python-for-PDF

Code used in my Medium Story https://medium.com/@umerfarooq_26378/python-for-pdf-ef0fac2808b0

python text-mining pdf-converter python3 pypdf2 tabula-py pdf-document-processor

Updated Sep 24, 2019
Jupyter Notebook

houking-can / PDFSDK

Based on Foxit Quick PDF Library，python interface

pdf-merge pdf-split pdf-document-processor pdf-sdk pdf-text-extraction

Updated Apr 4, 2020
Python

ksharindam / pdfcook

Prepress preparing tool and PDF editor

pdf pdf-document-processor pdf-editor prepress

Updated Apr 13, 2022
C++

bestsuperweb / Family

Family helper websites.

jquery php authentication codeigniter pdf-converter dropzonejs pdf-generation pdf-document-processor

Updated Nov 28, 2017
HTML

Phreak87 / LeptonicaSharp

Full featured wrapper for leptonica 1.77.0

wrapper library cmake computer-vision csharp dll libraries computer-graphics image-processing bytes tesseract clang leptonica image-manipulation image-classification image-recognition pdf-files image-segmentation image-analysis pdf-generation marshaller pix cmake-gui pdf-document-processor uinteger

Updated Sep 12, 2019
Visual Basic

armiro / cv-data-extractor

Extract essential data (e.g. GPA, skills, education, age, ...) from PDF-formatted working Resume files (under develop)

python data-extraction resume-parser pdf-document-processor

Updated Jul 31, 2018
Python

Improve this page

Add a description, image, and links to the pdf-document-processor topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-document-processor topic, visit your repo's landing page and select "manage topics."