Skip to content

Converts a pdf file into a text file while keeping the layout of the original pdf.

Notifications You must be signed in to change notification settings

madnight/pdf-layout-text-stripper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDFLayoutTextStripper as docker container command-line utility

license Code Climate Issue Count

Converts a PDF file into a text file while keeping the layout of the original PDF. Useful to extract the content from a table or a form in a PDF file. PDFLayoutTextStripper is a subclass of PDFTextStripper class (from the Apache PDFBox library).

  • Use cases
  • How to use

Use cases

Data extraction from a table in a PDF file example

Data extraction from a form in a PDF file example

How to use

# i do it myself
docker build -t pdf-layout-text-stripper .
docker run -v $(pwd):/app pdf-layout-text-stripper "sample.pdf"

# i'm lazy
docker run -v $(pwd):/app madnight/pdf-layout-text-stripper "sample.pdf"

About

Converts a pdf file into a text file while keeping the layout of the original pdf.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 98.9%
  • Dockerfile 1.1%