Saving a byte array to PDF file with OfficeJs

Question

Using OfficeJs I want to save a Word document as a PDF and post that file to an Api.

Office.context.document.getFileAsync will let you get the entire document in a choice of 3 formats:

compressed: returns the entire document (.pptx or .docx) in Office Open XML (OOXML) format as a byte array
pdf: returns the entire document in PDF format as a byte array
text: returns only the text of the document as a string. (Word only)

I am posting the PDF byte array to a WebApi action that looks like this:

public async Task<IHttpActionResult> Upload([FromBody]byte[] bytes)
{
    File.WriteAllBytes(@"C:\temp\testpdf.pdf", bytes);
    return Ok();
}

On inspection the byte array is the same array created by the getFileAsync from Office Js.

The problem is the file written in File.WriteAllBytes is corrupt. If I open it with notepad, it is a string of the bytes - 37,80,68,70,45,49,46,53,13,10,37... and so on.

Any idea why the method WriteAllBytes does not create a PDF file from the OfficeJS pdf byte stream?

UPDATE 25/5/16

As hawkeye @StefanHegny pointed out, the byte array appears to be Ascii characters. Converting each byte to char and writing that out to PDF like this creates a blank PDF, but on inspection with NotePad, the contents do like a like a PDF document, though quite different to that when saving the same .docx as a .pdf.

var content = "";
foreach (var b in model.Bytes)
{
    content += (char) b;
}

File.WriteAllText(@"C:\temp\testpdf.pdf", content);

Also note, this is extremely slow - about 5 minutes for 500kb PDF byte array on my dev machine.

37,80,68,70 looks like "%" (=ASCII 37) "P" "D" "F" which is the pdf file magic number, so that may well be the bytes of a pdf file so to me it looks okay if treated as a sequence of bytes with that value. But your question is why the bytes are written out as decimal values? — Stefan Hegny
– Stefan Hegny, Commented May 24, 2016 at 14:11
Wow, well spotted @StefanHegny! Yes, why a sequence of decimals, and not the PDF gunk that you usually see when looking at a PDF with NotePad? — marvc1
– marvc1, Commented May 24, 2016 at 15:39
Have you tried using File.WriteAllText(Encoding.Ascii.GetString(model.Bytes) ? — Chrisi
– Chrisi, Commented May 25, 2016 at 13:55
Not until you mentioned it @Chrisi. Unfortunately it creates the same document as the code from UPDATE 25/5/16 - A PDF document that has the same number of pages, but is only whitespace. — marvc1
– marvc1, Commented May 25, 2016 at 14:04
Damn, PDFshould be ANSI encoded text files. when you open them in notepad, you can kinda see the basic structure. it should start with %PDF-1.4 and occaisionally have something like 1 0 obj (or other numbers). can you check your created PDF in notepad? — Chrisi
– Chrisi, Commented May 25, 2016 at 14:14

Alexandru Popa · Accepted Answer · 2018-05-23 15:50:03Z

1

I had the same pdf empty problem, and it was because I was converting to string and writing string to file(encoding problem), I solved by sending to the c# code the comma separated byte codes instead of converting to string, parsing bytes and using File.WriteAllBytes()

C# code:

     string[] strings = HttpUtility.HtmlDecode(pdf).Split(',');

     byte[] bytes = strings.Select(s => byte.Parse(s)).ToArray();

     System.IO.File.WriteAllBytes("filename.pdf", bytes);

edited May 23, 2018 at 15:50

answered May 23, 2018 at 14:34

Alexandru Popa

214 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Saving a byte array to PDF file with OfficeJs

UPDATE 25/5/16

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

UPDATE 25/5/16

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related