"java.io.IOException: This may not be a PDF File"

Question

I want to read a pdf file from a url and convert it into a thumbnail image. I am using the following code. I didnt included the converting portion here.The problem is on the line "pdffile = new PDFFile(buf);" I get an exception "java.io.IOException: This may not be a PDF File". But I can see the pdf on the browser. What is wrong with me? Please help me.

    byte[] byteArray = null;
    InputStream is = null;
    String streamTo = null;
    BufferedImage bmg = null;
    PDFFile pdffile;
    ByteBuffer buf;
    int pageNumber = 1;

    try {
       is = fetchImageFromServer(url); //Pdf Url path
       if (!pageNumber.isEmpty()) {
         streamTo = is.toString(); 
         byteArray = streamTo.getBytes();
         buf = ByteBuffer.wrap(byteArray);
         pdffile = new PDFFile(buf);
       }
    } catch (IOExceptio e) {
    }

You should mention which library you're using to do this - PDFFile is not a standard Java class. — Andrzej Doyle
– Andrzej Doyle, Commented Jan 27, 2011 at 13:01
Try this article - How To Read A PDF File From A URL In Java. I wrote this for my company's Java PDF component suite but it does what you are trying to do. — BZ1
– BZ1, Commented Jan 28, 2011 at 4:13
I am fresher in stack overflow. I did not know much more. If any faults from my part or any advice to me are there kindly help. — nr.iras.sk
– nr.iras.sk, Commented Jan 28, 2011 at 4:58
+1, there is slight error in the code though: an extra while with an empty block: while ((baLength = is1.read(ba1)) != -1) { fos1.write(ba1, 0, baLength); } while (baLength != -1); — Maurice Perry
– Maurice Perry, Commented Jan 28, 2011 at 7:14
Thank you, Maurice. I am surprised that it compiled without a warning. I have fixed the error. — BZ1
– BZ1, Commented Feb 1, 2011 at 5:54

Maurice Perry · Accepted Answer · 2011-01-27 13:01:09Z

3

You must read the content of the stream. toString will not do that.

answered Jan 27, 2011 at 13:01

Maurice Perry

32.8k9 gold badges72 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

nr.iras.sk Over a year ago

Thanks for fast response. Pls explain a little more

Daniel Over a year ago

+1 @Sarika: Just read the Java Tutorial about streams... this is standard knowledge, which Maurice should not need to reproduce here.

nr.iras.sk Over a year ago

@ Daniel : -1 is not marked by me. Yesterday I had left the office.

vz0 · Accepted Answer · 2011-01-27 14:06:33Z

2

The is.toString() call won't read all the bytes correctly. There is an utility function at Apache Commons IO that will help you, IOUtils.toByteArray(). Try this:

is = fetchImageFromServer(url); //Pdf Url path
if (!pageNumber.isEmpty()) {
    byteArray = IOUtils.toByteArray(is);
    buf = ByteBuffer.wrap(byteArray);
    pdffile = new PDFFile(buf);
}

answered Jan 27, 2011 at 14:06

vz0

33k7 gold badges47 silver badges80 bronze badges

Comments

mark stephens · Accepted Answer · 2011-01-27 13:46:05Z

0

PDf is a binary object. If you convert it to a string, it will change byte values and break the file.

answered Jan 27, 2011 at 13:46

mark stephens

3,16820 silver badges19 bronze badges

4 Comments

mark stephens Over a year ago

The PDF file format is a blob. If you convert it to a String its like turing a PNG or other binary image into String and back. The conversion process will alter some bytes which will break the blob.

nr.iras.sk Over a year ago

@ mark stephens : Thanks. -1 is not marked by me. Yesterday I had left the office.

Daniel Over a year ago

In Java a string can contain any character. If you convert something to a string, and convert it back, everything is still fine! This is not C! Of course converting bytes using UTF8 in one direction, and ISOwahtever to convert it back, you get errors.

mark stephens Over a year ago

So long as you know exactly exactly what you are doing. It might keep it the same but its inefficient and could corrupt the blob if you use certain Writers/Stream functions. Hence the suggestion to avoid in general

Collectives™ on Stack Overflow

"java.io.IOException: This may not be a PDF File"

3 Answers 3

3 Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related