I'm pretty sure it uses a COM object to render this. In XSLT they use the ddwrt:limit function where the code looks like this:
public string Limit(string inputText, int maxLength, string additionalText)
{
if (inputText.Length > maxLength)
{
return (inputText.Substring(0, maxLength) + additionalText);
}
return inputText;
}
I'm sure you could write your own XSLT function that preserves the HTML and then do your own XSLT transformation of the data. I've written my own webpart which I do my own XSLT transformations of the list data instead of using the DataViewWebPart. I used reflector and snagged the DdwRuntime into my own class where I can add new functions that I can use in XSLT. Here's a code snippet.
XsltArgumentList xslArgs = new XsltArgumentList();
DdwRuntime runtime = new DdwRuntime();
runtime.View = view;
runtime.List = list;
runtime.Web = web;
runtime.ListItem = listItem;
xslArgs.AddExtensionObject("http://schemas.microsoft.com/WebParts/v2/DataView/runtime", runtime);
XslCompiledTransform transform = GetTransform();
using (StringWriter writer = new StringWriter())
{
transform.Transform(content, xslArgs, writer);
return writer.ToString();
}
But to be honest, this might be overkill for what you want.
I think just using some regular expressions you can get all html elements and make sure you only count real displayed text and make sure you only cut off displayed text.