-
Updated
Apr 20, 2022 - C#
#
pdf-document-processor
Here are 119 public repositories matching this topic...
PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等
Convert PDF to HTML without losing text or format.
-
Updated
Dec 15, 2021 - HTML
bunchofcoders
commented
Dec 28, 2021
Looks like the function below returns bytes with value 1 instead of 255 which produces near black png. for all other type of filters it works fine.
Filter: FlateDecode
ColorSpace: DeviceGray
BitsPerComponent: 1
public static byte[] Convert(ColorSpaceDetails details, IReadOnlyList decoded, int bitsPerComponent, int imageWidth, int imageHeight);
1
A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk
-
Updated
Mar 30, 2022 - Python
DocNET is as fast PDF editing and reading library for modern .NET applications
pdf
csharp
jpeg
pdf-converter
netcore
netstandard
pdf-files
pdf-document
pdf-conversion
pdf-extractor
pdf-document-processor
-
Updated
Apr 14, 2022 - C#
pdfCropMargins -- a program to crop the margins of PDF files
-
Updated
Mar 9, 2021 - Python
CCKS2019评测任务五-公众公司公告信息抽取,第3名
-
Updated
Sep 15, 2019 - Python
nlp
information-extraction
ibm-research
table-extraction
scientific-papers
pdf-document-processor
ibm-research-ai
-
Updated
May 20, 2021 - Java
Python library to manipulate PDF page labels
-
Updated
Dec 22, 2021 - Python
PDFio is a simple C library for reading and writing PDF files.
-
Updated
Mar 2, 2022 - C
Utility to convert PDF into JPG files
-
Updated
Nov 10, 2021 - Java
PDFViewer is a GUI tool, written using python3 and tkinter, which lets you view PDF documents.
pdf
tkinter
pdf-viewer
pdf-files
pdf-document
tkinter-graphic-interface
tkinter-gui
pdf-document-processor
tkinter-python
tkinter-library
-
Updated
Jul 4, 2021 - Python
.NET Standard P/Invoke bindings for PDFium.
-
Updated
Apr 13, 2022 - C#
bash
backups
python3
zhihu
bilibili
scripts-collection
srt-subtitles
pdf-document-processor
backup-utils
-
Updated
Mar 22, 2021 - Python
Prepare documents for distribution
-
Updated
Jun 17, 2021 - Python
Android port of pdf2htmlEX - Convert PDF to HTML without losing text or format.
-
Updated
Apr 20, 2022 - Java
How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.
nlp
data-science
text-mining
watson
natural-language
jupyter-notebook
artificial-intelligence
cloud-computing
recommender-system
self-learning
ibm-cloud
watson-nlu
watson-natural-language
unstructured-data
pdf-document-processor
watson-studio
-
Updated
May 27, 2020 - Jupyter Notebook
A collection of PDF command line tools and wrappers for Linux
-
Updated
Jan 26, 2022 - Shell
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
classifier
pdf
machine-learning
csharp
lightgbm
pdf-document
document-layout
layout-analysis
pdf-document-processor
document-layout-analysis
ml-net
pdfpig
publaynet
-
Updated
Mar 16, 2020 - C#
Converting pdf to any format for easily analyzing
-
Updated
Aug 29, 2019 - Python
persian and arabic fonts for TCPDF - PHP -فونت فارسی برای tcpdf
-
Updated
Apr 5, 2021
Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google
-
Updated
Feb 17, 2019 - Python
Spire.PDF for Java is a PDF component that enables to read, write, print and convert PDF documents in Java applications without using Adobe Acrobat.
-
Updated
Mar 12, 2019
Code used in my Medium Story https://medium.com/@umerfarooq_26378/python-for-pdf-ef0fac2808b0
-
Updated
Sep 24, 2019 - Jupyter Notebook
Based on Foxit Quick PDF Library,python interface
-
Updated
Apr 4, 2020 - Python
Prepress preparing tool and PDF editor
-
Updated
Apr 13, 2022 - C++
Family helper websites.
jquery
php
authentication
codeigniter
pdf-converter
dropzonejs
pdf-generation
pdf-document-processor
-
Updated
Nov 28, 2017 - HTML
Full featured wrapper for leptonica 1.77.0
wrapper
library
cmake
computer-vision
csharp
dll
libraries
computer-graphics
image-processing
bytes
tesseract
clang
leptonica
image-manipulation
image-classification
image-recognition
pdf-files
image-segmentation
image-analysis
pdf-generation
marshaller
pix
cmake-gui
pdf-document-processor
uinteger
-
Updated
Sep 12, 2019 - Visual Basic
Extract essential data (e.g. GPA, skills, education, age, ...) from PDF-formatted working Resume files (under develop)
-
Updated
Jul 31, 2018 - Python
Improve this page
Add a description, image, and links to the pdf-document-processor topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pdf-document-processor topic, visit your repo's landing page and select "manage topics."
Is your feature request related to a problem? Please describe.
The problem is inefficiency when simply looking for a single operand and then stopping processing.
For example, if only looking for a single colored pixel in a page.
Describe the solution you'd like
It would make sense to be able to set a stop flag on the processor and return out of the handler, which would cause the proc