1

I want to read a pdf file from a url and convert it into a thumbnail image. I am using the following code. I didnt included the converting portion here.The problem is on the line "pdffile = new PDFFile(buf);" I get an exception "java.io.IOException: This may not be a PDF File". But I can see the pdf on the browser. What is wrong with me? Please help me.

    byte[] byteArray = null;
    InputStream is = null;
    String streamTo = null;
    BufferedImage bmg = null;
    PDFFile pdffile;
    ByteBuffer buf;
    int pageNumber = 1;

    try {
       is = fetchImageFromServer(url); //Pdf Url path
       if (!pageNumber.isEmpty()) {
         streamTo = is.toString(); 
         byteArray = streamTo.getBytes();
         buf = ByteBuffer.wrap(byteArray);
         pdffile = new PDFFile(buf);
       }
    } catch (IOExceptio e) {
    }
5
  • You should mention which library you're using to do this - PDFFile is not a standard Java class. Commented Jan 27, 2011 at 13:01
  • Try this article - How To Read A PDF File From A URL In Java. I wrote this for my company's Java PDF component suite but it does what you are trying to do. Commented Jan 28, 2011 at 4:13
  • I am fresher in stack overflow. I did not know much more. If any faults from my part or any advice to me are there kindly help. Commented Jan 28, 2011 at 4:58
  • +1, there is slight error in the code though: an extra while with an empty block: while ((baLength = is1.read(ba1)) != -1) { fos1.write(ba1, 0, baLength); } while (baLength != -1); Commented Jan 28, 2011 at 7:14
  • Thank you, Maurice. I am surprised that it compiled without a warning. I have fixed the error. Commented Feb 1, 2011 at 5:54

3 Answers 3

3

You must read the content of the stream. toString will not do that.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for fast response. Pls explain a little more
+1 @Sarika: Just read the Java Tutorial about streams... this is standard knowledge, which Maurice should not need to reproduce here.
@ Daniel : -1 is not marked by me. Yesterday I had left the office.
2

The is.toString() call won't read all the bytes correctly. There is an utility function at Apache Commons IO that will help you, IOUtils.toByteArray(). Try this:

is = fetchImageFromServer(url); //Pdf Url path
if (!pageNumber.isEmpty()) {
    byteArray = IOUtils.toByteArray(is);
    buf = ByteBuffer.wrap(byteArray);
    pdffile = new PDFFile(buf);
}

Comments

0

PDf is a binary object. If you convert it to a string, it will change byte values and break the file.

4 Comments

The PDF file format is a blob. If you convert it to a String its like turing a PNG or other binary image into String and back. The conversion process will alter some bytes which will break the blob.
@ mark stephens : Thanks. -1 is not marked by me. Yesterday I had left the office.
In Java a string can contain any character. If you convert something to a string, and convert it back, everything is still fine! This is not C! Of course converting bytes using UTF8 in one direction, and ISOwahtever to convert it back, you get errors.
So long as you know exactly exactly what you are doing. It might keep it the same but its inefficient and could corrupt the blob if you use certain Writers/Stream functions. Hence the suggestion to avoid in general

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.