2

I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...

My pattern:

<p>(.*?)</p>

Subject:

<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>

Result :

My content

What I want:

My content. Second sentence.
6
  • 3
    Don't parse HTML with RegEx Commented Feb 19, 2013 at 23:51
  • 1
    You can get the body of <p> tags just fine with regex (despite the warnings against parsing generally with it), but if you're using JavaScript there's no need to since you have document.getElementsByTagName("p"). Commented Feb 19, 2013 at 23:58
  • @iamnotmaynard - document.getElementsByTagName() is a DOM method. It is only available to JavaScript because the browser provides it. With node.js, there is no browser, and node.js does not natively parse HTML into a DOM. You can't assume that, just because you are using the JavaScript language, a browser DOM is available. A DOM can be made available to node.js if such a package is installed, such as jsdom. Commented Feb 20, 2013 at 0:06
  • @gilly3 Ah, I see. Was not aware of that. Commented Feb 20, 2013 at 0:07
  • @gilly3, hoh no... Not that easy generic answer again -_-. Using regex for what he wants is perfectly fine. Commented Feb 20, 2013 at 0:45

2 Answers 2

5

There is no "capture all group matches" (analogous to PHP's preg_match_all) in JavaScript, but you can cheat by using .replace:

var matches = [];
html.replace(/<p>(.*?)<\/p>/g, function () {
    //arguments[0] is the entire match
    matches.push(arguments[1]);
});
Sign up to request clarification or add additional context in comments.

6 Comments

Ok so, how can I do using Jade and NodeJS for extract the text between <p> and </p>?
@tonymx227 I don't really know what you mean .. that code is just raw JavaScript, so you should be able to use it with any JS interpreter
Yes I know. But with controller I send to my Jade view (for example) all the posts, with my view I try to get the content of a post without tag... ${posts.content.match('/<p>(.*?)<\/p>/g')} but it doesn't work...
I don't know how to use Jade views, so I wouldn't really be able to help you there. I said to use .replace, not match, though
I asked a new question because it's not the same subject. But thank you anyway.
|
1

To get more than one match of a pattern the global flag g is added.
The match method ignores capture groups () when matching globally, but the exec method does not. See MDN exec.

var m,
    rex = /<p>(.*?)<\/p>/g,
    str = '<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>';

while ( ( m = rex.exec( str ) ) != null ) {
    console.log( m[1] );
}

//  My content. 
//  Second sentence. 

If there may be newlines between the paragraphs, use [\s\S], meaning match any space or non-space character, instead of ..

Note that this kind of regex will fail on nested paragraphs as it will match up to the first closing tag.

2 Comments

There's no such thing as "nested paragraphs". A <p> does not require a closing tag. A block element that occurs after an open <p> tag implies a closing </p> tag. Your regexp will treat multiple paragraphs without closing tags as one single paragraph.
@gilly3. XHTML requires the closing tag and I think the OP makes it quite clear in his question he is looking for the content between opening and closing p tags. It is pretty obvious my answer assumes the closing tags and if there isn't any the OP's regex (not mine) won't match anyway. Nevertheless, I think your observation is worthwhile, so thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.