0

The case:

  • Server doesn't support exec/shell_exec (so pdftotext is excluded)
  • Other libraries don't accept the PDF. Pdftotext works (tested on the files locally)

Here are some excerpts from the (PDF)code:


5 0 obj
>
stream
Gat$ugPXc?%"6H'p]ofd'_qs00UX27?3p0*8m>KOQL4]:u"*$$^'f*q*SGMee*e$5&=alj\@GV7YPq9pg!Lr0>Y2n'&lmd4Br?V9N
P:_",WI.kJ\#'cs>77M9eTkA;,t#f)aaGuNS-6=Wp*uBg,Ft9Tcj#aI]nD[C6&m@9m?m!p6=IBt=o_LGHh!q>f$C.jdOXbSP/796HV`_Y]Y
l)M(]FZ9Ld-J_mMRe2q(D>`V@G`NM]crn@_V?sGC@W9^bnrY$.mqeVN^YEcqK)blO~>
endstream
endobj

About the creator:

%PDF-1.4
1 0 obj
>
endobj

I would like to get some suggestions about how to convert this to plain text in PHP, without using the exec/shell_exec functions.

Thank you.

(Other solutions like http://webcheatsheet.com/php/reading_clean_text_from_pdf.php didn't work, and I couldn't get them to at least convert this code to something looking like ASCII-code.)

1
  • 3
    Do you have curl installed in PHP and/or can you make external HTTP connections? If so, consider using file_get_contents() or SOAP (etc) to do the conversion via an external API. I don't personally know of one, but there is bound to be such a thing on the web. Commented Jun 14, 2012 at 20:24

1 Answer 1

2

You cannot just parse this stream as you need to then decode the data using lots of other data in the file (like font encoding). You really want to use a library to do this...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.