Document Layout Analysis resources repos for development with PdfPig.
-
Updated
Oct 1, 2023 - C#
Document Layout Analysis resources repos for development with PdfPig.
Extract tables from PDF files (port of tabula-java)
A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).
Cross-platform pdf reader application
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Cross-platform library to render pdf documents as images with PdfPig using SkiaSharp
ChatGPT-like Application using RAG pattern that allows to ask question to my own documents - I Used Semantic Kernel to integrate a LLM (OpenAI) using C# to orchestrate AI pluggins (Azure Cognitive Services). For the document embeddings I used Qdrant for the vector database and Pdfpig to extract the content from the pdfs
Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
🔬 Proof of Concept of extracting content from PDF files using multiple PDF libraries
Add a description, image, and links to the pdfpig topic page so that developers can more easily learn about it.
To associate your repository with the pdfpig topic, visit your repo's landing page and select "manage topics."