6

I'm trying to take a PDDocument object and pass it to other module as InputStream without saving the document to the file system.

Now, I read about PDStream and kind of understood the purpose of this. Hence, I tried to do something like this:

PDStream stream = new PDStream(document);

InputStream is = stream.createInputStream();

But when I try to load that input stream into a PDDocument, I get this error:

Exception in thread "main" java.io.IOException: Error: End-of-File, expected line
    at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1111)
    at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:1885)
    at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:1868)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1098)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:995)
    at app.DGDCreator.main(DGDCreator.java:35)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:143)

Later I discovered that the result file is 0kb in size...

5
  • Why not using a java.io.Input/OutputStream and save(OutputStream out) and load(InputStream in) ? Commented Feb 16, 2017 at 19:00
  • Because I don't want to save the document. I want to pass it as a stream of data to another module Commented Feb 16, 2017 at 19:05
  • What do you want to do with the document in the other module? Why not just pass the document object? Commented Feb 16, 2017 at 19:13
  • Why not save to a BufferedArrayOutputStream and then create a BufferedArrayInputStream from there? Commented Feb 16, 2017 at 19:30
  • 1
    new PDStream(document) does not create a new stream containing the document but instead a new stream to use inside the document. If you really want to stream a pdf from one piece of code to the next without buffering it as a whole, consider using a PipedInputream/PipedOutputStream construct. Commented Feb 16, 2017 at 21:18

2 Answers 2

19

So anyone else searching can have a good answer to this. I ran into this same situation where I didn't want to have to save the file to any machine and just handle the stream itself. I found an answer here and will repeat it below.

ByteArrayOutputStream out = new ByteArrayOutputStream();
pdDoc.save(out);
pdDoc.close();
ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
Sign up to request clarification or add additional context in comments.

7 Comments

This essentially is the same as @AhmetRasitBekar's answer.
My implementation doesn't require a physical file location to save the file to in order to get it into an input stream. It does everything in memory.
Your 4 code lines effectively are a subset of his code, they more or less represent the core of his solution, too, the rest of his code may be there to illustrate the use. you might say you isolated the essential code, but that's about it.
Oh yes, that's right. I guess I missed that. Thank you for pointing that out.
@rhavelka Ok, Eric's answer apparently does have its merits, in particular it concentrates on the actual problem...
|
6

I couldn't understand why you want to do this but, following code will do it:

public static void main(String[] args) throws IOException {
    byte[] file = FileUtils.readFileToByteArray(new File(
            "C:\\temp\\a_file.pdf"));

    PDDocument document = null;

    InputStream is = null;
    ByteArrayOutputStream out = null;

    try {
        document = PDDocument.load(file);
        out = new ByteArrayOutputStream();

        document.save(out);

        byte[] data = out.toByteArray();
        is = new ByteArrayInputStream(data);

        FileUtils.writeByteArrayToFile(new File(
                "C:\\temp\\denemeTEST123.pdf"), IOUtils.toByteArray(is));
    } finally {
        IOUtils.closeQuietly(out);
        IOUtils.closeQuietly(is);
        IOUtils.closeQuietly(document);
    }
}

1 Comment

One use case is writing the document to an HTTP connection. Most server frameworks, such as Spring, expect an InputStream. However, PDFBox only writes the bytes to an OutputStream. So you have to write to the OutputStream and then read it back into an InputStream.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.