1

I creating a C# application that has to create a word document.

I'm using the Microsoft.Office.Interop.Word to do this and I've successfully managed to output some word documents, but creating the content trough the code is a very time consuming work.

I noted that word is able to open html pages and show it as a normal content so I created a simple test table in html and inserted it into the word document. But when I outputted the document the obvious happened: The tags where still there! Word did not format the tags as html. It just outputted exactly what I put in there.

How can I tell word to reformat the text as html?

edit: (trough the C# code of course)

edit 2: Please note that I'm parsing trough some data to make this, so I will end up with about 4 pages of the same table/html, so I will need to be able to tell word to start at the next page each time I've finished a loop. So a html-only method will probably not work.

3
  • possible duplicate of How to convert HTML file to word?, read those answers there, they are providing alternative ways which will also work using C# Commented Apr 1, 2011 at 13:38
  • this is not a duplicate... I clearly format my question more detailed, I'm using C# and I'm not asking for a library to do this. Commented Apr 1, 2011 at 13:42
  • your edit shows (more than before) that you should use a library for your task instead of going the HTML route. And there is a C# port of Apache POI available, which should solve your performance issues with Interop, look here stackoverflow.com/questions/2680546/… Commented Apr 1, 2011 at 13:45

6 Answers 6

5

If you're only wanting to output simple HTML content as a Word document, you could always cheat and write out the HTML content with a .doc extension.

Word will open that just fine.

If you need to add a page break, you can use a CSS page-break-before, like so:

<br style="page-break-before: always;"/>

If you're set on using Interop, having read up a little bit, this post states that you need a converter to insert HTML, and the converters are only accessible when:

  • you paste HTML from the Clipboard
  • open/insert HTML from a file

So, this answer looks like it provides a clipboard-based solution : Adding html text to Word using Interop

However, if there's any money to spend on the project, I can heartily recommend Aspose.Words which will do all of this for you.

Sign up to request clarification or add additional context in comments.

12 Comments

Haha, nice I did not know that, sweet cheat! But that doesn't solve my problem, because I need to output multiple pages and I can't tell word trough html to create a new page.
I don't know about Word documents, but I've run into terrible trouble writing out HTML content and giving the file a .xls or .xlsx extension - Office 2007 gives a nice "The file you are trying to open .xlsx is in a different format than specified by the file extension" error, which often doesn't receive focus.
Oh this answer worked just fine, but it's not exactly what I was looking for, because I need to be able to tell when to resume on a new page.
@Pieter888: if dealing with pages is your only issue, take a look at w3.org/TR/CSS21/page.html. More specifically, page-break-before:always is already used by Word when you insert a page-break on a document and save it as HTML, so it should be able to understand it when opening a document ;)
@herenvardo: It appears from this question that it doesn't work, which is a shame!
|
1

As requested by the OP, and to make easier for others to find this solution, here it goes the answer I posted as a comment (plus extra results from testing):

When opening an HTML file, MS Word honors the CSS properties page-break-before and page-break-after. There is a caveat, however:

On "Web design" view, page-breaks are never shown (this doesn't mean that they aren't there), just like browsers don't "show" them. And Word opens html files on Web design view by default (which quite makes sense). You need to print the document or switch to some other view (typicall "Print design") to see your breaks in all their glory.

So, saving an HTML file with a .doc extension is a viable solution (also tested: Word opens it properly despite of the extension).

Note: all the testing was done on MS Word 2003 using this snippet: <html>asdf<br style="page-break-before: always;">new page!</html>

Comments

1

Don't build the document in code, create it in Word as template or mail merge template and the use code to merge or replace the fields data.

See this answer here MS Word Office Automation - Filling Text Form Fields And Check Box Form Fields And Mail Merge

And See this from the mothership:

http://msdn.microsoft.com/en-us/library/ff433638.aspx

2 Comments

Does this work when there is a part in the document where you have to iterate trough some data?
If this is a long running process such as ASP.Net or a Windows service you will run into problems with Office automation.
1

If you don't want to use an external lib, Interop is too slow for you and neither pure HTML nor mail merge template are flexible enough, you could write your content as text or HTML into one or more files (using C#), create a VBA macro in a Word document which by itself creates a second Word document, reads the content files and does any formatting you want afterwards.

You can run this macro programmatically by starting Word using the command line switch /m.

Comments

1

Another possible approach, if your html is xhtml (i.e. XML compliant), you could use XSLT to convert it to a Word XML format. But this would take a LOOOOOOOOOOONG time to code.

If you don't have to use HTML as the starting point you could simply build the Word XML document yourself rather than using XSLT, which would be easier. Time consuming but possible - it's something I do quite a lot in my work.

Comments

0

If a third party component is an option I would recommend the stuff from Aspose.
I have been pretty happy with their tools so far. The API is a little messy but everything works as one would expect.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.