Reading a PDF
- convert streams in the PDF to human readable text:
qpdf --qdf --object-streams=disable ./sample_pdf/IIT_JEE_\ MCQ_BINOMIAL_THEOREM.pdf uncompressed-qpdf.pdf
- Getting font info from ttf file:
https://fontdrop.info/
Nice tool.
- Extract fonts from PDF
https://www.pdfconvertonline.com/extract-pdf-fonts-online.html
- the bible for PDF rendering and extraction:
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf
- one very useful post:
https://stackoverflow.com/a/29474423
- I think a simple PDF tool would not do.. for my case.. where some obscure font is used and only translation of streams by drawing the glyph from the chart .. displays the text correctly.. but as a path..
only recognisable to human eye not a computer.
https://github.com/jesparza/peepdf
- or look at this JS library implementation:
- it seems like an exhaustive implementation for creating PDFs (embedding fonts and various encodings for writing streams.. )
- my be I can reverse engineer it to work for my case :D
- let's see
https://github.com/Hopding/pdf-lib#fonts-and-unicode
- here's an example:
https://github.com/Hopding/pdf-lib/issues/296
- now this is what I am going to try :D
https://pdfbox.apache.org/2.0/commandline.html
since I need to deliver the final output.
- I don't want to expose the code to prevent editing by anyone.. so !! java it is.

Comments
Post a Comment