0

I am attempting to extract a part number from a string. I am going to iterate over items, and need to extract the item if it is over 4 characters long, and contains AT LEAST 1 number. It does not have to include letters, but can.

For instance:

Line1: 'There is some random information here'
Line2: 'This includes item p23344dd5 as well as other info'
Line3: 'K3455 $100.00'
Line4: 'Last part number here 5551234'

What I need is to extract the 3 item numbers, p23344dd5, K3455, and 5551234.

I am using this code, but it just returns if it matches, which is not what i need. I need to return the matched text.

import re

items = ['There is some random information here',
         'This includes item p23344dd5 as well as other info',
         'K3455 $100.00',
         'Line4: ''Last part number here 5551234']

for item in items:
    x = re.search(r'^(?=.*\d).{5,}$', item)
    print(x)
5
  • 1
    $100.00 is over 4 characters long and contains at least one number. What constitutes a word boundary? Commented Jan 16, 2021 at 20:49
  • That is correct, and it looks like it is also returning, so I need to edit my regex to exclude that also. Commented Jan 16, 2021 at 20:51
  • 2
    If it should be excluded, why? It fits your specification. Commented Jan 16, 2021 at 20:52
  • I think that's a good question by @ggorlen. What defines a part number. Is there any other specifications or pattern to pick up here? Or does a part number only allows for digits and alpha chars? Commented Jan 16, 2021 at 21:01
  • @JvdV The part number can only contain numbers and letters. Commented Jan 16, 2021 at 21:33

2 Answers 2

2

To match the values in the question, you can assert at least 5 word characters from a whitespace boundary, and then match at least a single digit.

(?<!\S)(?=\w{5})[^\W\d]*\d\w*(?!\S)

Explanation

  • (?<!\S) Whitespace boundary at the left
  • (?=\w{5}) Assert 5 word chars
  • [^\W\d]* Match optional word chars without a digit
  • \d Match 1 digit
  • \w* Match optional word chars
  • (?!\S) Assert a whitespace boundary at the right

regex demo | Python demo

import re

items = ['There is some random information here',
         'This includes item p23344dd5 as well as other info',
         'K3455 $100.00',
         'Line4: ''Last part number here 5551234']

for item in items:
    x = re.search(r'(?<!\S)(?=\w{5})\w*\d\w*(?!\S)', item)
    if x:
        print(x.group())

p23344dd5
K3455
5551234
Sign up to request clarification or add additional context in comments.

Comments

1

Here's how to extract the matching text. This doesn't fix the issue with the regular expression, as mentioned in the comments, but does extract the matching value as you asked. The problem is that the whole line matches, with the way you have written the regex.

import re

items = ['There is some random information here',
         'This includes item p23344dd5 as well as other info',
         'K3455 $100.00',
         'Line4: ''Last part number here 5551234']

for item in items:
    m = re.search(r'^(?=.*\d).{5,}$', item)
    if m is not None:
        print(m.group(0))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.