1

I am learning python, I wanted to capture the data between 'NUMBER:' and \n

NUMBER: 3741733552\n556644

the number after the new line character in variable, hence cannot count on it to capture.

    re.search(r'NUMBER:(.*?)[\n]', string_data).group(1)

I tried above code(which is wrong) in vain, please help in capturing that number. Thank you.

Edit:

I have a String "NAME: KHAN NASEEM\n\n22972 LAHSER RD\n\n..." to which I used like the code

    name = re.search(r'NAME:\s*(.+)', string_data) 

but the output I got is "KHAN NASEEM\n\n22972 LAHSER RD\n\n...", But I want only KHAN NASEEM only.

\n = string literal, not actual new line

25
  • Use r'NUMBER:\s*(\d+) or r'NUMBER:\s*(.+) Commented Oct 12, 2017 at 15:40
  • Thanks @WiktorStribiżew The above solution worked \d is for digits and .+ is for chars if I encounter similar kind of a problem right? Commented Oct 12, 2017 at 15:49
  • . matches any char but a line break char. \d matches digits, but mind that in Python 3, it will match any Unicode digits. If you only need to mstch ASCII digits you will have to use re.A flag or just use [0-9]. Commented Oct 12, 2017 at 15:54
  • Use NAME:\s*(.+) Commented Oct 12, 2017 at 17:14
  • 1
    Yes, I understand, but a regex like r'\bNAME:\s*(.+?)(?:\\n|$)' is not a good solution because your string is "escaped". Your main problem is the escaped string. Commented Oct 12, 2017 at 20:26

3 Answers 3

1

If you are trying to get all chars from NAME: up to the backslash followed with n letter, use

\bNAME:\s*(.+?)(?:\\n|$)

See the regex demo.

Details

  • \b - a word boundary
  • NAME: - a NAME: substring
  • \s* - 0+ whitespaces
  • (.+?) - Group 1: one or more chars other than line breal chars, as few as possible
  • (?:\\n|$) - either the end of string or a backslash followed with n

Below is the Python demo:

import re
s = r'NAME: KHAN NASEEM\n\n22972 LAHSER RD\n\n...' # Note r'' prefix: all \ are literal backslashes here!
m = re.search(r'\bNAME:\s*(.+?)(?:\\n|$)', s)
if m:
    print(m.group(1)) # => KHAN NASEEM

NOTE: You should check how text is fetched from the DB to Python. The \n should actually be newlines. Once fixed, you will just have to use

r'\bNAME:\s*(.+)'

A whole word NAME:, 0+ whitespaces, and Group 1 will capture one or more chars other than line break chars, as many as possible (i.e. the rest of the line).

Sign up to request clarification or add additional context in comments.

Comments

1

You can try this:

import re
s = "NUMBER: 3741733552\n556644"
final_data = re.findall('NUMBER:\s*(.*?)\n', s)

Output:

['3741733552']

Comments

1

Below is my solution to your question. It is short and simple, also easy to read. You could get more complex with it, but I like to keep things easy :-). I hope this helps you!

>>> import re
>>> num = 'NUMBER: 3741733552\n556644'
>>> search = re.search(r'([0-9].*)', num).group(0)
>>> print(search)
3741733552

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.