2

I have had this issue in multiple applications now and I am wondering if anyone has come up with a more efficient solution than mine. Essentially, my goal is to convert the content within a cell, to an HTML string to include all of its formatting. My workaround up to this point has been to loop through each character in the string to determine the font size, weight, and style, however, this can prove to be extremely slow when converting a lot of data at once.

2
  • 3
    You haven't provided any specific examples of the data you're working with, but excel has the ability to save as HTML. If time really is a bottleneck, it could well be worthwhile to save as html, then analyse the resulting file to extract the relevant information. I'd recommend you first save your spreadsheet as html and look at the output source yourself, to see if it might help. Commented Oct 16, 2012 at 23:42
  • 2
    I think if you want all your style info to be inline and you need precise control over what gets output, then what you're already doing is going to give you the best result (saying that not having seen any of your code...) Commented Oct 17, 2012 at 3:55

1 Answer 1

1

Going through each character in turn will be very slow, but should only be necessary in extreme cases. I've tackled this same problem quite successfully using the following method.

For each relevant property (bold, italic, etc.) I build up an array that stores the position of each change in the value of that property. Then when generating the HTML, I can spit out all the text up until the next change (in any property). Where changes are infrequent, this is clearly faster.

Now, to arrive at the position of the changes in each property, I first test whether there are in fact any changes, and this is easy - for example, Font.Bold will return true if all the text is bold, false if it's all non bold, and null (or some other value - I can't remember) if there are both bold and non-bold parts.

So, if there's no change in the value at all, we're done already. If there is a change in the value, then I do a binary sub-division of the text into two halves and start again. Again, I might find that one half is all the same, and the other half contains a change, so I do another sub-division of the second half as before, and so on.

Since very few cells tend to have lots of changes, and many have none at all, this ends up being quite efficient. Or at least much more efficient than the character by character method.

Sign up to request clarification or add additional context in comments.

3 Comments

@Gary_McGill - great theory on how it should work, but how do you ensure the HTML doesn't have mismatches eg you wouldn't want <font name="Arial">Hello <b>World</font></b>
@Rob: if you have a list of all the points of change, and if you keep track of which tags are active as you output the HTML, then it's not at all hard to deal with that. If it's html4 you're outputting, then I wouldn't bother though - browsers are very forgiving of that sort of thing. Html5, maybe less so.
We have a strange use case were the in house propitiatory browser only takes strict html - thanks for the perspective - would love it if you had an example :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.