0

This is ASP.Net Core 7.0 and Open XML SDK 2.19.0.

I'm cloning an existing template Word document from disk to a new file and then insert HTML at a specific place indicated by placeholder text, using an AltChunk. It doesn't matter how simple the content in the AltChunk is, the document is always reported as corrupted when I try to open it in Word.

string rootPath = _environment.WebRootPath;
string filePath = Path.Combine(rootPath, "files", "quotes", $"{DetailedQuote.Quote.QuoteID}.docx");
// Open the original template document and clone it to the new path as editable
DetailedTemplate.templateDocument = (WordprocessingDocument) _document.ReadWordDoc(quote.Template.TemplateID, "template").Clone(filePath, true);

// Insert content from service documents
var mainPart = DetailedTemplate.templateDocument.MainDocumentPart;
var paragraphs = mainPart.Document.Body.Descendants<Paragraph>();

foreach (var paragraph in paragraphs)
{
    if (paragraph.InnerText == "[##(SERVICE_DETAILS)##]")
    {
        string serviceDescriptionHTML = "Hello";
        var chunkID = 0;
        foreach (var service in DetailedQuote.Quote.QuoteServices)
        {
            string sChunkID = $"myhtmlID{chunkID++}";
            AlternativeFormatImportPart oChunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, sChunkID);
            using(MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(serviceDescriptionHTML)))
            {
                oChunk.FeedData(memoryStream);
            }
            AltChunk oAltChunk = new AltChunk();
            oAltChunk.Id =sChunkID ;
            // Add the chunk to the paragraph
            paragraph.Parent.InsertAfter(oAltChunk, paragraph);
        }
    }
}
// Save changes to the main document
mainPart.Document.Save();
// Close the document so that we can read it from disk
DetailedTemplate.templateDocument.Close();
// Return the content of the main document as a FileResult
byte[] fileBytes = System.IO.File.ReadAllBytes(filePath);
return File(fileBytes, "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "MyDocument.docx");

The OpenXmlValidator I run after the document is created (not included in my example) doesn't report any errors and neither does the Open XML SDK 2.5 Productivity Tool.

If I instead simply update the text of the paragraph, then the document opens without errors in Word.

...
foreach (var service in DetailedQuote.Quote.QuoteServices)
{
    var text = paragraph.Descendants<Text>().FirstOrDefault();
    if (text != null)
    {
        text.Text = "This is text!";
    }                        
}
...

This to me can only mean that adding the AltChunk is messing something up but as far as I understand, adding an AltChunk is the correct way to add HTML to a Word document.

I've spent two days reading pretty much everything I can find on the topic, I've asked every bot out there to help me find the issue, I've tried Open-Xml-PowerTools but can't find any good documentation, I've tried HtmlToOpenXml but got versioning issues, and I've opened the .docx file to dig through it manually but have so far not been able to resolve this.

Any and all help is greatly appreciated!

[Edit]

If I allow Word to try and open the generated document the contents are present and looking as expected. If I then save the "recovered" document as a new file, this document will also be flagged as corrupted if I open it with Word again.

3
  • While I haven't used Open XML for generating Word documents, I have used it extensively for Excel and I ran into this situation many times; a seemingly valid file (according to the Productivity Tool) was rejected by Excel. A technique I found useful was to manually perform the action on the template file and then observe what changed. You can use the Productivity Tool to do this using Compare File.... Commented Apr 20, 2023 at 17:21
  • @idz - When you say "manually perform the action on the template file" do you mean in Word? Commented Apr 21, 2023 at 10:49
  • yes, that is what I meant. Do just that operation in the application – in your case Word – and examine what it does. Apologies, my comment, in retrospect could have been clearer, but it looks like you got a solution. Commented Apr 25, 2023 at 16:57

1 Answer 1

3

The problem occurs because Microsoft Word is unable to parse "Hello" as HTML, yea... I know...

Anyway, try using this:

string serviceDescriptionHTML = "<html>Hello</html>";

Or this:

string serviceDescriptionHTML = "<body>Hello</body>";

Or this:

string serviceDescriptionHTML = "<!DOCTYPE html>Hello";
Sign up to request clarification or add additional context in comments.

1 Comment

Oh my god! It was as simple as enclosing my HTML in <html>...</html>. A million thanks! If I could I would give you a hundred upvotes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.