0

Say i have a text like this:

This should also be extracted, <strong>text</strong>

I need the text only from the entire string, I have tried this:

r = r.replace(/<strong[\s\S]*?>[\s\S]*?<\/strong>/g, "$1"); but failed (strong is still there). Is there any proper way to do this?

Expected Result

This should also be extracted, text

Solution:

To target specific tag I used this:

r = r.replace(/<strong\b[^>]*>([^<>]*)<\/strong>/i, "**$1**")

0

2 Answers 2

3

To parse HTML, you need an HTML parser. See this answer for why.

If you just want to remove <strong> and </strong> from the text, you don't need parsing, but of course simplistic solutions tend to fail, which is why you need an HTML parser to parse HTML. Here's a simplistic solution that removes <strong> and </strong>:

str = str.replace(/<\/?strong>/g, "")

var yourString = "This should also be extracted, <strong>text</strong>";
yourString = yourString.replace(/<\/?strong>/g, "")
display(yourString);

function display(msg) {
  // Show a message, making sure any HTML tags show
  // as text
  var p = document.createElement('p');
  p.innerHTML = msg.replace(/&/g, "&amp;").replace(/</g, "&lt;");
  document.body.appendChild(p);
}

Back to parsing: In your case, you can easily do it with the browser's parser, if you're on a browser:

var yourString = "This should also be extracted, <strong>text</strong>";
var div = document.createElement('div');
div.innerHTML = yourString;
display(div.innerText || div.textContent);

function display(msg) {
  // Show a message, making sure any HTML tags show
  // as text
  var p = document.createElement('p');
  p.innerHTML = msg.replace(/&/g, "&amp;").replace(/</g, "&lt;");
  document.body.appendChild(p);
}

Most browsers provide innerText; Firefox provides textContent, which is why there's that || there.

In a non-browser environment, you'll want some kind of DOM library (there are lots of them).

Sign up to request clarification or add additional context in comments.

5 Comments

I am not parsing the html, i just need the text without the <strong> tag shown in a text document later
@T.J.Crowder - Right :-) I didn't see that!
@user2002495: To reliably get the content within HTML tags, you have to parse HTML. It's as simple as that. Attempts to use simplistic rules will fail.
Thanks for all the answer, in the end I solved it on my own, but all your answers seems enlightening
@user2002495: FWIW, for just strong and without attributes (based on your comments elsewhere), I did add a regex example.
2

You can do this

var r = "This should also be extracted, <strong>text</strong>";
r = r.replace(/<(.+?)>([^<]+)<\/\1>/,"$2");
console.log(r);

I have just included some strict regex. But if you want relaxed version, you can very well do

r = r.replace(/<.+?>/g,"");

6 Comments

thanks, is it possible to only regex only on strong tag with your code?
Thanks, see solution, I have able to target specific tag only
@user2002495 don't forget to accept any answer which helped you the most
This does, of course, fail with <strong data-attr="hey look: > ">text</strong> Hence needing to parse.
I don't need to check for attributes, since basically I control the entire DOM elements itself in an uneditable iframe, I just need to be able to do what I want
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.