regex to select entire line

Question

I want to capture all the lines from a string of text using regex. How do I do that? None of these work. The first one almost works, but doesn't catch \r\n

import re

given_text = '1stline\n2ndline\r3rdline\r\n4thline'
list_of_lines = re.findall('(?m)^.*$', given_text)
print(list_of_lines)

list_of_lines = re.findall('(?m)^.*(\r\n|\r|\n|$)', given_text)
print(list_of_lines)

list_of_lines = re.findall(r'(?m)^.*?(\r\n|\r|\n|$)', given_text)
print(list_of_lines)

To match all non-empty lines, you can use re.findall('[^\r\n]+', given_text). Or, you may use re.split(r'\r\n?|\n', given_text) if you need to get empty lines, too. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 22, 2021 at 19:01

Wiktor Stribiżew · Accepted Answer · 2021-04-22 22:41:38Z

Certainly splitlines() is the right tool for the job.

The following solutions may help if all you need is to deal with CR, \r (carriage return) and LF, \n (line feed character):

re.findall('[^\r\n]+', given_text) # Returns all non-empty lines split with one or more CR/LF chars
re.split(r'\r\n?|\n', given_text)  # Splits with the most common CRLF, CR or LF line endings

Note the re.split solution will return empty lines, too.

Details

[^\r\n]+ - one or more chars other than CR and LF chars
\r\n?|\n - a CR and an optional LF char (\r\n?) or (|) a newline, LF, only (\n)

If you need to support all possible Unicode line breaks, you can use

re.findall(r'[^\r\n\x0B\x0C\x85\u2028\u2029]+', given_text)
re.split(r'\r\n?|[\n\x0B\x0C\x85\u2028\u2029]', given_text)

NOTES:

Char	Description
`\r (\x0D)`	CARRIAGE RETURN, CR
`\n (\x0A)`	LINE FEED, LF
`\x0B`	LINE TABULATION, LT
`\x0C`	FORM FEED, FF
`‎\x85`	NEXT LINE, NEL
`\u‎2028`	LINE SEPARATOR, LS
`\u‎2029`	PARAGRAPH SEPARATOR, PS

See a Python demo:

import re
given_text = '1stline\n2ndline\r3rdline\r\n4thline\r\n\r\nLast Line after an empty line'
print( re.findall('[^\r\n]+', given_text) )
# => ['1stline', '2ndline', '3rdline', '4thline', 'Last Line after an empty line']
print( re.split(r'\r\n?|\n', given_text) )
# => ['1stline', '2ndline', '3rdline', '4thline', '', 'Last Line after an empty line']
print( re.findall(r'[^\r\n\x0B\x0C\x85\u2028\u2029]+', given_text) )
# => ['1stline', '2ndline', '3rdline', '4thline', 'Last Line after an empty line']
print( re.split(r'\r\n?|[\n\x0B\x0C\x85\u2028\u2029]', given_text) )
# => ['1stline', '2ndline', '3rdline', '4thline', '', 'Last Line after an empty line']

I appreciate the thoroughness. Regex seems like it should be simple, but there are so many weird subtleties that trip me up.

Wiktor Stribiżew · Accepted Answer · 2021-04-22 20:17:57Z

2

This code gives you the list of lines with regex:

import re
given_text = '1stline\n2ndline\r3rdline\r\n4thline'
list_of_lines = re.split(r'\r\n|\r|\n', given_text) 
print(list_of_lines)

result:

['1stline', '2ndline', '3rdline', '4thline']

edited Apr 22, 2021 at 20:17

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

answered Apr 22, 2021 at 19:08

Franco Morero

5595 silver badges20 bronze badges

2 Comments

Ryan B. Jawad Over a year ago

Thanks, Franco. This seems to work well. I think Wiktor's works too and is a little bit more concise.

Wiktor Stribiżew Over a year ago

@RyanB.Jawad I posted the full answer. I have been tricked with Unicode line break chars so much in the past that I decided to include them into the solution.

FrontRanger · Accepted Answer · 2021-04-22 19:06:52Z

1

While it doesn't use regex,

given_text.splitlines()

will produce

['1stline', '2ndline', '3rdline', '4thline']

Edit: Per your commented request, if you have to use regex,

re.split("\n\r+|\r\n+|\n+|\r+", given_text)

will also produce

['1stline', '2ndline', '3rdline', '4thline']

edited Apr 22, 2021 at 19:06

answered Apr 22, 2021 at 18:56

FrontRanger

2191 silver badge5 bronze badges

2 Comments

Ryan B. Jawad Over a year ago

That's helpful. Thanks. I still would like to know how to do it with regex.

FrontRanger Over a year ago

Updated with one method using regex.

Collectives™ on Stack Overflow

regex to select entire line

3 Answers 3

1 Comment

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related