Regex capture data between String and \n character in Python

Question

I am learning python, I wanted to capture the data between 'NUMBER:' and \n

NUMBER: 3741733552\n556644

the number after the new line character in variable, hence cannot count on it to capture.

    re.search(r'NUMBER:(.*?)[\n]', string_data).group(1)

I tried above code(which is wrong) in vain, please help in capturing that number. Thank you.

Edit:

I have a String "NAME: KHAN NASEEM\n\n22972 LAHSER RD\n\n..." to which I used like the code

    name = re.search(r'NAME:\s*(.+)', string_data)

but the output I got is "KHAN NASEEM\n\n22972 LAHSER RD\n\n...", But I want only KHAN NASEEM only.

\n = string literal, not actual new line

Thanks @WiktorStribiżew The above solution worked \d is for digits and .+ is for chars if I encounter similar kind of a problem right? — Srini Jagadeesh
– Srini Jagadeesh, Commented Oct 12, 2017 at 15:49
. matches any char but a line break char. \d matches digits, but mind that in Python 3, it will match any Unicode digits. If you only need to mstch ASCII digits you will have to use re.A flag or just use [0-9]. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 12, 2017 at 15:54
Yes, I understand, but a regex like r'\bNAME:\s*(.+?)(?:\\n|$)' is not a good solution because your string is "escaped". Your main problem is the escaped string. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 12, 2017 at 20:26

Wiktor Stribiżew · Accepted Answer · 2017-10-12 20:35:35Z

If you are trying to get all chars from NAME: up to the backslash followed with n letter, use

\bNAME:\s*(.+?)(?:\\n|$)

See the regex demo.

Details

\b - a word boundary
NAME: - a NAME: substring
\s* - 0+ whitespaces
(.+?) - Group 1: one or more chars other than line breal chars, as few as possible
(?:\\n|$) - either the end of string or a backslash followed with n

Below is the Python demo:

import re
s = r'NAME: KHAN NASEEM\n\n22972 LAHSER RD\n\n...' # Note r'' prefix: all \ are literal backslashes here!
m = re.search(r'\bNAME:\s*(.+?)(?:\\n|$)', s)
if m:
    print(m.group(1)) # => KHAN NASEEM

NOTE: You should check how text is fetched from the DB to Python. The \n should actually be newlines. Once fixed, you will just have to use

r'\bNAME:\s*(.+)'

A whole word NAME:, 0+ whitespaces, and Group 1 will capture one or more chars other than line break chars, as many as possible (i.e. the rest of the line).

Ajax1234 · Accepted Answer · 2017-10-12 15:39:16Z

1

You can try this:

import re
s = "NUMBER: 3741733552\n556644"
final_data = re.findall('NUMBER:\s*(.*?)\n', s)

Output:

['3741733552']

answered Oct 12, 2017 at 15:39

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

Comments

theonlydante · Accepted Answer · 2017-10-12 15:55:35Z

1

Below is my solution to your question. It is short and simple, also easy to read. You could get more complex with it, but I like to keep things easy :-). I hope this helps you!

>>> import re
>>> num = 'NUMBER: 3741733552\n556644'
>>> search = re.search(r'([0-9].*)', num).group(0)
>>> print(search)
3741733552

answered Oct 12, 2017 at 15:55

theonlydante

214 bronze badges

Collectives™ on Stack Overflow

Regex capture data between String and \n character in Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related