2

I'd like a regex that matches paragraphs, so for example :

The red brown fox, did something. [newline] I don't remember this text.

[newline]

[newline] So, instead I'll say blah blah. [newline] Blah.

would return an array like this:

['The red brown...', 'So instead I'll say...']

I already have this regex (that I stole, shh): /(?:[^\r\n]|\r(?!\n))+/gm

However this pattern matches both linebreaks (one newline) and paragraphs breaks (two newline). How can I match the body of text between paragraphs, but not split the matches between single linebreaks?

5
  • I need to match the body of text, not the actual newlines, I'm sorry. Let me edit that. Commented Oct 21, 2016 at 17:46
  • Ok, what is the linebreak style you are interested in? Commented Oct 21, 2016 at 17:50
  • "I already have this regex (that I stole, shh)" bad Sebastian :D Commented Oct 21, 2016 at 17:53
  • I have come up with .match(/(?:.|(?:\r?\n|\r)(?!\r?\n|\r))+/g) but I do not like it. This is good: s.match(/.+(?:(?:\r?\n|\r)(?!\r?\n|\r).*)*/g). Splitting is better: var s = "The red brown fox, did something.\r\nI don't remember this text.\r\n\r\nSo, instead I'll say blah blah.\r\nBlah."; console.log(s.split(/(?:\r\n){2,}/g)); Commented Oct 21, 2016 at 17:56
  • @WiktorStribiżew Thanks, splitting did the trick! Commented Oct 21, 2016 at 18:45

2 Answers 2

2

You can use this regex /(.+)((\r?\n.+)*)/gm to capture only what can be considered a paragraph. According to your description, a paragraph can have normal text and single line breaks. The following example implements this solution.

// Orignial input
var input = `

    The red brown fox, did something.
I don't remember this text

So, instead I'll say blah blah. 
Blah.

another paragraph
`;

document.write('<code>ORIGNIAL</code><pre>' + input + '</pre><hr>');

var 
  regex = /(.+)((\r?\n.+)*)/gm,
  matches, output = []; // output is used to store all paragraphs

while (matches = regex.exec(input)) {
  output.push(matches[0]);
  document.write('<code>PARAGRAPH ' + output.length + '</code><pre>' + matches[0] + '</pre><hr>');
}
pre {
  background-color: lightGray;
  margin: 2px 0;
}
hr {
  border: none;
  margin:0;
  padding:0;
}

Sign up to request clarification or add additional context in comments.

4 Comments

This still matches single linebreaks, not just paragraphs.
@SebastianOlsen you expect line breaks to be inside the paragraph?
Yes, I want linebreaks inside paragraphs.
@SebastianOlsen late response but I've updated my answer.
1

You may split with the linebreak (sequence, depends on the linebreak style) with a {2,} limiting quantifier:

var s = "The red brown fox, did something.\r\nI don't remember this text.\r\n\r\nSo, instead I'll say blah blah.\r\nBlah.";    
console.log(s.split(/(?:\r\n){2,}/));

So, here, /(?:\r\n){2,}/ matches 2 or more consecutive CR+LF sequences. If the linebreak style is LF only, use a simpler /\n{2,}/ pattern.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.