2

I'm looking for fastets way to stripe HTML tags from content in Google Apps Script.

For now I'm using these functions to HTML parsing:

function getTextFromHtml(body) {
  return getTextFromNode(Xml.parse(body, true).getElement());
}

function getTextFromNode(x) {
 switch(x.toString()) {
  case 'XmlText': return x.toXmlString();
  case 'XmlElement': return x.getNodes().map(getTextFromNode).join('');
  default: return '';
 }
}

But for long HTML's this way is so inefficient.

Sample HTML content: http://pastebin.com/FmB4hvN2

Any ideas?

2 Answers 2

5

This would remove all tags from the input.

 var text = html.replace(/<[^>]+>/g, "");
Sign up to request clarification or add additional context in comments.

Comments

1

If the content you want to replace is always wrapped with < and >, you can do

Regex rgx = new Regex(someString);
string result = rgx.Replace("<[^>]*>", "");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.