Skip to content

Unstructured.IO: ETL for LLMs

Welcome to Unstructured.IO! We're here on a mission to make all of your documents available for LLM applications, from PDFs and Word Docs to emails and markdown. To get started, check out our open source offerings.

Tried the open source library and ready for more power? Check out our products page to learn more about our paid API and Unstructured Platform, and ETL tool built around our core file transformation capabilities.

Learn more

Section Description
Company Website Unstructured.io product and company info
Documentation Full unstructured documentation

Popular repositories Loading

  1. unstructured unstructured Public

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    HTML 8.2k 660

  2. unstructured-api unstructured-api Public

    Python 475 99

  3. unstructured-inference unstructured-inference Public

    Python 143 45

  4. pipeline-sec-filings pipeline-sec-filings Public archive

    Preprocessing pipeline notebooks and API supporting text extraction from SEC documents

    Jupyter Notebook 136 28

  5. unstructured-python-client unstructured-python-client Public

    A Python client for the Unstructured hosted API

    Python 70 11

  6. unstructured-js-client unstructured-js-client Public

    A Typescript client for the Unstructured hosted API

    TypeScript 32 8

Repositories

Showing 10 of 32 repositories
  • docs Public

    Documentation for all Unstructured products and libraries

    Unstructured-IO/docs’s past year of commit activity
    MDX 2 14 0 3 Updated Aug 23, 2024
  • unstructured Public

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    Unstructured-IO/unstructured’s past year of commit activity
    HTML 8,150 Apache-2.0 660 204 (4 issues need help) 41 Updated Aug 23, 2024
  • Unstructured-IO/unstructured-ingest’s past year of commit activity
    HTML 1 Apache-2.0 2 1 3 Updated Aug 23, 2024
  • unstructured.PaddleOCR Public Forked from PaddlePaddle/PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

    Unstructured-IO/unstructured.PaddleOCR’s past year of commit activity
    Python 25 Apache-2.0 7,757 0 0 Updated Aug 23, 2024
  • unstructured-python-client Public

    A Python client for the Unstructured hosted API

    Unstructured-IO/unstructured-python-client’s past year of commit activity
    Python 70 MIT 11 8 1 Updated Aug 23, 2024
  • danswer Public Forked from danswer-ai/danswer

    Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

    Unstructured-IO/danswer’s past year of commit activity
    Python 4 1,240 0 1 Updated Aug 23, 2024
  • Unstructured-IO/unstructured-api’s past year of commit activity
    Python 475 Apache-2.0 99 14 7 Updated Aug 22, 2024
  • unstructured.pytesseract Public Forked from madmaze/pytesseract

    A Python wrapper for Google Tesseract

    Unstructured-IO/unstructured.pytesseract’s past year of commit activity
    Python 3 Apache-2.0 751 0 0 Updated Aug 15, 2024
  • unstructured-js-client Public

    A Typescript client for the Unstructured hosted API

    Unstructured-IO/unstructured-js-client’s past year of commit activity
    TypeScript 32 MIT 8 4 0 Updated Aug 14, 2024
  • base-images Public

    Store Dockerfiles and Packer configs for images to use as a base to build upon

    Unstructured-IO/base-images’s past year of commit activity
    Shell 1 Apache-2.0 2 1 1 Updated Aug 13, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.