Getting wrong data with regex

Question

I'm facing an issue here. Python version 3.7.

As you can see on regex site, my regex is working great, however, when I try to read the strings with python, I only get the first part, meaning, no values after comma.

Here's my code:

part_number = str(row)
partn = re.search(r"([a-zA-Z0-9 ,-]+)", part_number)
print(partn.group(0))

This is what partn.group(0) is printing:

FMC2H-OHC-100018-00

I need to get the string as regex, with comma and value:

FMC2H-OHC-100018-00, 2

Is it my regex wrong?. What is happening with commas and values?

ROW Values Here are the row values converted to string, the data retrieve from my db also include parentheses and quotes:

('FMC2H-OHC-100018-00', 2)
('FMC2H-OHC-100027-00', 0)

I have read that group(0) returns complete match, so what am I doing wrong? — Javier Ramirez
– Javier Ramirez, Commented Nov 21, 2018 at 23:45
Please copy enough of your input into the question to locally reproduce your results. — Jongware
– Jongware, Commented Nov 21, 2018 at 23:46
Your character class isn't matching the single quotes, by the way. — user6516765
– user6516765, Commented Nov 21, 2018 at 23:58

martineau · Accepted Answer · 2018-11-22 03:34:40Z

I don't think the you need to convert the row values to string and then try to parse the result with a regex. The clue was when you said in your update that "Here are the row values converted to string" implying that they're in some other format initially—because the result looks they're actually tuples of two values, a string and an integer.

If that's correct, then you can avoid converting them to strings and then trying to parse it with a regex, because you can get the string you want simply by using the relatively simple built-in string formatting capabilities Python has to do it.

Here's what I mean:

# Raw row data retrieved from database.
rows = [('FMC2H-OHC-100018-00', 2),
        ('FMC2H-OHC-100027-00', 0),
        ('FMC2H-OHC-100033-00', 0),
        ('FMC2H-OHC-100032-00', 20),
        ('FMC2H-OHC-100017-00', 16)]

for row in rows:
    result = '{}, {}'.format(*row)  # Convert data in row to a formatted string.
    print(result)

Output:

FMC2H-OHC-100018-00, 2
FMC2H-OHC-100027-00, 0
FMC2H-OHC-100033-00, 0
FMC2H-OHC-100032-00, 20
FMC2H-OHC-100017-00, 16

Wiktor Stribiżew · Accepted Answer · 2018-11-22 09:08:03Z

0

Your problem is that you didn't include the ' in your character group. So this regex matches for example FMC2H-OHC-100018-00 and , 2, but not both together. Also re.search stops searching after it finds the first match. So if you only want the first match, go with:

re.search(r"([\w ',-]+)", part_number)

Where I changed A-Za-z0-9 to \w, because it's shorter and more readable. If you want a list that matches all elements, go with:

re.findall(r"([\w ',-]+)", part_number)

edited Nov 22, 2018 at 9:08

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

answered Nov 22, 2018 at 0:17

user8408080

2,4781 gold badge12 silver badges20 bronze badges

4 Comments

user6516765 Over a year ago

Personally, I'd use \(\w{5}-\w{3}-\d{6}-\d{2}, \d+\), which is more specific.

user8408080 Over a year ago

This is also what I would do, but I wanted to change OP's regex as little as possible, because he still might have a reason for this. Good addition, though! Also you still missed the ' ;)

user6516765 Over a year ago

Oh, sorry, I just copied that from my comment which for some reason didn't update in this tab when I corrected it. The regex I have open in regex101 right now is '(\w{5}-\w{3}-\d{6}-\d{2})', (\d+), SO's live update functionality just sucks. Guess I didn't look it over for the 32nd time after copying it over...

Wiktor Stribiżew Over a year ago

[\w\d] = \w, but \w is not equal to [A-Za-z0-9], in Python 3, \w matches any Unicode letter, digit, _ and some diacritics.

Collectives™ on Stack Overflow

Getting wrong data with regex

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related