Extract text between paragraph tag using RegEx

Question

I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...

My pattern:

<p>(.*?)</p>

Subject:

<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>

Result :

My content

What I want:

My content. Second sentence.

You can get the body of  tags just fine with regex (despite the warnings against parsing generally with it), but if you're using JavaScript there's no need to since you have document.getElementsByTagName("p"). — Reinstate Monica -- notmaynard
– Reinstate Monica -- notmaynard, Commented Feb 19, 2013 at 23:58
@iamnotmaynard - document.getElementsByTagName() is a DOM method. It is only available to JavaScript because the browser provides it. With node.js, there is no browser, and node.js does not natively parse HTML into a DOM. You can't assume that, just because you are using the JavaScript language, a browser DOM is available. A DOM can be made available to node.js if such a package is installed, such as jsdom. — gilly3
– gilly3, Commented Feb 20, 2013 at 0:06
@gilly3, hoh no... Not that easy generic answer again -_-. Using regex for what he wants is perfectly fine. — Jean-Philippe Leclerc
– Jean-Philippe Leclerc, Commented Feb 20, 2013 at 0:45

Explosion Pills · Accepted Answer · 2013-02-19 23:52:32Z

5

There is no "capture all group matches" (analogous to PHP's preg_match_all) in JavaScript, but you can cheat by using .replace:

var matches = [];
html.replace(/<p>(.*?)<\/p>/g, function () {
    //arguments[0] is the entire match
    matches.push(arguments[1]);
});

answered Feb 19, 2013 at 23:52

Explosion Pills

192k56 gold badges341 silver badges417 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

tonymx227 Over a year ago

Ok so, how can I do using Jade and NodeJS for extract the text between and ?

Explosion Pills Over a year ago

@tonymx227 I don't really know what you mean .. that code is just raw JavaScript, so you should be able to use it with any JS interpreter

tonymx227 Over a year ago

Yes I know. But with controller I send to my Jade view (for example) all the posts, with my view I try to get the content of a post without tag... ${posts.content.match('/(.*?)<\/p>/g')} but it doesn't work...

Explosion Pills Over a year ago

I don't know how to use Jade views, so I wouldn't really be able to help you there. I said to use .replace, not match, though

tonymx227 Over a year ago

I asked a new question because it's not the same subject. But thank you anyway.

|

MikeM · Accepted Answer · 2013-02-20 09:57:28Z

1

To get more than one match of a pattern the global flag g is added.
The match method ignores capture groups () when matching globally, but the exec method does not. See MDN exec.

var m,
    rex = /<p>(.*?)<\/p>/g,
    str = '<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>';

while ( ( m = rex.exec( str ) ) != null ) {
    console.log( m[1] );
}

//  My content. 
//  Second sentence.

If there may be newlines between the paragraphs, use [\s\S], meaning match any space or non-space character, instead of ..

Note that this kind of regex will fail on nested paragraphs as it will match up to the first closing tag.

answered Feb 20, 2013 at 9:57

MikeM

13.8k3 gold badges38 silver badges49 bronze badges

2 Comments

gilly3 Over a year ago

There's no such thing as "nested paragraphs". A  does not require a closing tag. A block element that occurs after an open  tag implies a closing  tag. Your regexp will treat multiple paragraphs without closing tags as one single paragraph.

MikeM Over a year ago

@gilly3. XHTML requires the closing tag and I think the OP makes it quite clear in his question he is looking for the content between opening and closing p tags. It is pretty obvious my answer assumes the closing tags and if there isn't any the OP's regex (not mine) won't match anyway. Nevertheless, I think your observation is worthwhile, so thank you.

Collectives™ on Stack Overflow

Extract text between paragraph tag using RegEx

2 Answers 2

6 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related