5

I have string with HTML images, for example:

string str = "There is some nice <img alt='img1' src='img/img1.png' /> images in this <img alt='img2' src='img/img2.png' /> string. I would like to ask you <img alt='img3' src='img/img3.png' /> how Can I can I get the Lenght of the string?";

I would like to get the lenght of the string without the images and the count of images. So, the result should be:

int strLenght = 111;
int imagesCount= 3;

Can you show me the most effective way, please?

Thanks

3
  • You can do this with the help of RegularExpression. Please let me know if you need solution based on it Commented Apr 29, 2016 at 11:11
  • Take a look to this answer to remove HTML tags: stackoverflow.com/a/18154046/5119765 Then you'll be able to get the string length. Commented Apr 29, 2016 at 11:14
  • 1
    Your best option would be to use a html parser like Html Agility Pack so you can properly count the character length of the content and the number of image tags. Commented Apr 29, 2016 at 11:18

5 Answers 5

4

I'd suggest to use a real HTML parser, for example HtmlAgilityPack. Then it's simple:

string html = "There is some nice <img alt='img1' src='img/img1.png' /> images in this <img alt='img2' src='img/img2.png' /> string. I would like to ask you <img alt='img3' src='img/img3.png' /> how Can I can I get the Lenght of the string?";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
int length = doc.DocumentNode.InnerText.Length;               // 114
int imageCount = doc.DocumentNode.Descendants("img").Count(); // 3

This is what DocumentNode.InnerText returns in your sample, you've skipped some spaces:

There is some nice  images in this  string. I would like to ask you  how Can I can I get the Lenght of the string?
Sign up to request clarification or add additional context in comments.

Comments

2

I had a similar problem and I've created this method. You can use it to strip HTML tags and count your string

public static string StripHtmlTags(string source)
{
  if (string.IsNullOrEmpty(source))
  {
    return string.Empty;
  }

  var array = new char[source.Length];
  int arrayIndex = 0;
  bool inside = false;
  for (int i = 0; i < source.Length; i++)
  {
    char let = source[i];
    if (let == '<')
    {
      inside = true;
      continue;
    }

    if (let == '>')
    {
      inside = false;
      continue;
    }

    if (!inside)
    {
      array[arrayIndex] = let;
      arrayIndex++;
    }
  }

  return new string(array, 0, arrayIndex);
}

your counting would be like:

int strLength = StripHtmlTags(str).Count;

1 Comment

You know you could just do foreach(char let in source) instead since string implements IEnumerable<char>.
2

Add a (COM) reference to MSHTML (Microsoft HTML object lib) and you can:

var doc = (IHTMLDocument2)new HTMLDocument();
doc.write(str);

Console.WriteLine("Length: {0}", doc.body.innerText.Length);
Console.WriteLine("Images: {0}", doc.images.length);

Comments

1

If you would like to do it with the help of RegularExpression as i mentioned in my comment above. Please use following code

var regex = new System.Text.RegularExpressions.Regex("<img[^>]*/>");
var plainString = regex.Replace(str, ""); 

// plainString.length will be string length without images
    var cnt = regex.Matches(str).Count; // cnt will be number of images

Comments

0

I liked John Smith solution, however I had to add Trim() at the end to match the MS Word result.

Use this:

return new string(array, 0, arrayIndex).Trim();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.