96

How do you split a long piece of text into separate lines? Why does this return line1 twice?

/^(.*?)$/mg.exec('line1\r\nline2\r\n');

["line1", "line1"]

I turned on the multi-line modifier to make ^ and $ match beginning and end of lines. I also turned on the global modifier to capture all lines.

I wish to use a regex split and not String.split because I'll be dealing with both Linux \n and Windows \r\n line endings.

7 Answers 7

163
arrayOfLines = lineString.match(/[^\r\n]+/g);

As Tim said, it is both the entire match and capture. It appears regex.exec(string) returns on finding the first match regardless of global modifier, wheras string.match(regex) is honouring global.

Sign up to request clarification or add additional context in comments.

4 Comments

As a note, Tim's will match empty lines, wheras mine will not. Either may or may not be desirable.
Old answer, but i'd like to say that the reason exec returns on first match is because it's intended to be called multiple times for global regexes, until it returns null, and the regex stores things like lastIndex i.e. the index at which to start the next match.
Try "123\n\n1234".match(/[^\r\n]+/g); expected Array [ "123", "", "1234" ], but got Array [ "123", "1234" ]
@sea-kg use Tim's answer below to get empty lines
124

Use

result = subject.split(/\r?\n/);

Your regex returns line1 twice because line1 is both the entire match and the contents of the first capturing group.

7 Comments

You need to use the g flag, and \r is a valid newline on some old apple machines. Also, unicode defines \u2028, \u2029, and the old IBM newline \u0085 as newlines. So /[\n\u0085\u2028\u2029]|\r\n?/g handles all the edge cases.
@Mike: Are you sure about the /g flag? Doesn't make sense to have a split function that only splits once unless explicitly told otherwise. And Jojo said that he's dealing just with Linux and Windows. What next, EBCDIC?
@Mike: No, the /g flag is not required. You can add it, but JavaScript just ignores it. As Tim said, the default behavior is to split as many times as possible, but you can use the second argument to impose a maximum.
As for what constitutes a newline, it's even worse than that. According to the Unicode Consortium we should always use (\r\n|[\n\v\f\r\x85\u2028\u2029]), no matter what platform the software runs on, or where the data comes from.
@Alan, quite right. The g flag controls whether capturing groups are included in the output.
|
28

I am assuming following constitute newlines

  1. \r followed by \n
  2. \n followed by \r
  3. \n present alone
  4. \r present alone

Please Use

var re=/\r\n|\n\r|\n|\r/g;

arrayofLines=lineString.replace(re,"\n").split("\n");

for an array of all Lines including the empty ones.

OR

Please Use

arrayOfLines = lineString.match(/[^\r\n]+/g); 

For an array of non empty Lines

4 Comments

\n followed by \r is not a single newline
It is on some platforms. If you check Environment.NewLine in C# you will see \n\r
@GravityAPI bad idea if your dealing with files that came from other platforms, eg linux server processing file created on windows.
Can you please provide an example? I am using this Regex on multiple platforms including Linux etc. for years with no issues. I will be happy to fix based on your example. I have an entire system based on these lines split.
24

Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:

var lines = text.split(/[\r\n]+/g);

With whitespace trimming:

var lines = text.trim().split(/\s*[\r\n]+\s*/g);

1 Comment

The first one removes empty lines in the middle of the text but not at the beginning or the end. That's fine for my purposes, I'm just pointing it out for anybody who needs the removal to be consistent.
11

Unicode Compliant Line Splitting

Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):

const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)

I don't understand why the negative look-ahead part ((?!\r\n)) is necessary, but that is what is suggested in the Unicode document 🤷‍♂️.

The above document recommends to define a regular expression meta-character for matching all line ending characters and sequences. Perl has \R for that. Unfortunately, JavaScript does not include such a meta-character. Alas, I could not even find a TC39 proposal for that.

1 Comment

good and working answer
5

First replace all \r\n with \n, then String.split.

6 Comments

This takes two commands. Can it be done with regex in one command?
@JoJo: myString.replace(/\r\n/, "\n").split("\n") (unless you are asking because of academic interest :))
'line1\r\nline2\r\n'.replace(/\r\n/, '\n').split('\n').without(''); produces a wrong second cell: ["line1", "line2\r"]
@JoJo: Sorry, I forgot the /g flag for global! It should be: myString.replace(/\r\n/g, "\n").split("\n")
@Jojo: This is succinctly in one line :) Regexes aren't the tool for every job. They can be very powerful, but should not be used everywhere. Note that replace is a regex.
|
1

http://jsfiddle.net/uq55en5o/

var lines = text.match(/^.*((\r\n|\n|\r)|$)/gm);

I have done something like this. Above link is my fiddle.

1 Comment

This leaves the line separator at the end.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.