Extracting Specific Regex result from string

Question

I am attempting to extract a part number from a string. I am going to iterate over items, and need to extract the item if it is over 4 characters long, and contains AT LEAST 1 number. It does not have to include letters, but can.

For instance:

Line1: 'There is some random information here'
Line2: 'This includes item p23344dd5 as well as other info'
Line3: 'K3455 $100.00'
Line4: 'Last part number here 5551234'

What I need is to extract the 3 item numbers, p23344dd5, K3455, and 5551234.

I am using this code, but it just returns if it matches, which is not what i need. I need to return the matched text.

import re

items = ['There is some random information here',
         'This includes item p23344dd5 as well as other info',
         'K3455 $100.00',
         'Line4: ''Last part number here 5551234']

for item in items:
    x = re.search(r'^(?=.*\d).{5,}$', item)
    print(x)

$100.00 is over 4 characters long and contains at least one number. What constitutes a word boundary? — ggorlen
– ggorlen, Commented Jan 16, 2021 at 20:49
That is correct, and it looks like it is also returning, so I need to edit my regex to exclude that also. — Lzypenguin
– Lzypenguin, Commented Jan 16, 2021 at 20:51
I think that's a good question by @ggorlen. What defines a part number. Is there any other specifications or pattern to pick up here? Or does a part number only allows for digits and alpha chars? — JvdV
– JvdV, Commented Jan 16, 2021 at 21:01

The fourth bird · Accepted Answer · 2021-01-16 20:58:28Z

2

To match the values in the question, you can assert at least 5 word characters from a whitespace boundary, and then match at least a single digit.

(?<!\S)(?=\w{5})[^\W\d]*\d\w*(?!\S)

Explanation

(?<!\S) Whitespace boundary at the left
(?=\w{5}) Assert 5 word chars
[^\W\d]* Match optional word chars without a digit
\d Match 1 digit
\w* Match optional word chars
(?!\S) Assert a whitespace boundary at the right

regex demo | Python demo

import re

items = ['There is some random information here',
         'This includes item p23344dd5 as well as other info',
         'K3455 $100.00',
         'Line4: ''Last part number here 5551234']

for item in items:
    x = re.search(r'(?<!\S)(?=\w{5})\w*\d\w*(?!\S)', item)
    if x:
        print(x.group())

p23344dd5
K3455
5551234

edited Jan 16, 2021 at 20:58

answered Jan 16, 2021 at 20:50

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user984003 · Accepted Answer · 2021-01-16 21:06:59Z

1

Here's how to extract the matching text. This doesn't fix the issue with the regular expression, as mentioned in the comments, but does extract the matching value as you asked. The problem is that the whole line matches, with the way you have written the regex.

import re

items = ['There is some random information here',
         'This includes item p23344dd5 as well as other info',
         'K3455 $100.00',
         'Line4: ''Last part number here 5551234']

for item in items:
    m = re.search(r'^(?=.*\d).{5,}$', item)
    if m is not None:
        print(m.group(0))

edited Jan 16, 2021 at 21:06

answered Jan 16, 2021 at 21:00

user984003

29.9k69 gold badges205 silver badges320 bronze badges

Collectives™ on Stack Overflow

Extracting Specific Regex result from string

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related