1

Using OfficeJs I want to save a Word document as a PDF and post that file to an Api.

Office.context.document.getFileAsync will let you get the entire document in a choice of 3 formats:

  • compressed: returns the entire document (.pptx or .docx) in Office Open XML (OOXML) format as a byte array
  • pdf: returns the entire document in PDF format as a byte array
  • text: returns only the text of the document as a string. (Word only)

I am posting the PDF byte array to a WebApi action that looks like this:

public async Task<IHttpActionResult> Upload([FromBody]byte[] bytes)
{
    File.WriteAllBytes(@"C:\temp\testpdf.pdf", bytes);
    return Ok();
}

On inspection the byte array is the same array created by the getFileAsync from Office Js.

The problem is the file written in File.WriteAllBytes is corrupt. If I open it with notepad, it is a string of the bytes - 37,80,68,70,45,49,46,53,13,10,37... and so on.

Any idea why the method WriteAllBytes does not create a PDF file from the OfficeJS pdf byte stream?

UPDATE 25/5/16

As hawkeye @StefanHegny pointed out, the byte array appears to be Ascii characters. Converting each byte to char and writing that out to PDF like this creates a blank PDF, but on inspection with NotePad, the contents do like a like a PDF document, though quite different to that when saving the same .docx as a .pdf.

var content = "";
foreach (var b in model.Bytes)
{
    content += (char) b;
}

File.WriteAllText(@"C:\temp\testpdf.pdf", content);

Also note, this is extremely slow - about 5 minutes for 500kb PDF byte array on my dev machine.

8
  • 1
    37,80,68,70 looks like "%" (=ASCII 37) "P" "D" "F" which is the pdf file magic number, so that may well be the bytes of a pdf file so to me it looks okay if treated as a sequence of bytes with that value. But your question is why the bytes are written out as decimal values? Commented May 24, 2016 at 14:11
  • Wow, well spotted @StefanHegny! Yes, why a sequence of decimals, and not the PDF gunk that you usually see when looking at a PDF with NotePad? Commented May 24, 2016 at 15:39
  • Have you tried using File.WriteAllText(Encoding.Ascii.GetString(model.Bytes) ? Commented May 25, 2016 at 13:55
  • Not until you mentioned it @Chrisi. Unfortunately it creates the same document as the code from UPDATE 25/5/16 - A PDF document that has the same number of pages, but is only whitespace. Commented May 25, 2016 at 14:04
  • Damn, PDFshould be ANSI encoded text files. when you open them in notepad, you can kinda see the basic structure. it should start with %PDF-1.4 and occaisionally have something like 1 0 obj (or other numbers). can you check your created PDF in notepad? Commented May 25, 2016 at 14:14

1 Answer 1

1

I had the same pdf empty problem, and it was because I was converting to string and writing string to file(encoding problem), I solved by sending to the c# code the comma separated byte codes instead of converting to string, parsing bytes and using File.WriteAllBytes()

C# code:

     string[] strings = HttpUtility.HtmlDecode(pdf).Split(',');

     byte[] bytes = strings.Select(s => byte.Parse(s)).ToArray();

     System.IO.File.WriteAllBytes("filename.pdf", bytes);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.