0

I need to parse a string and extract substrings to a variable. The strings are not the same. Examples and expected results:

string 1 (no "NAME" pattern): "RB2 F 27/0/31 0/32, R8 28/31/120 2/0/2"
result 1 (save "as is"): "RB2 F 27/0/31 0/32, R8 28/31/120 2/0/2"

string 2 (comma separated "<key>:<value>" pairs): "TYPE :ABC,NAME: AB.DE,DESC:10/10/5:2,COMMENT: , ID:123"
result 2 (extracted comma separated "<key>=<value>" substrings): "TYPE=ABC","NAME=AB.DE","DESC=10/10/5:2","COMMENT=","ID=123"

string 3: "ID:123, NAME:CDE,DESC:10-10/5:2"
result 3: "ID=123","NAME:CDE","DESC=10-10/5:2"

Playing with "re.compile" and "split" but can't find a regex suitable for all the mentioned examples.

1
  • Why "NAME:CDE" not converted to "NAME=CDE"? Commented Sep 1, 2021 at 14:40

2 Answers 2

0

You can use below regex to find key value pairs with findall(). then join them with =

([a-zA-Z ]+):(.*?)(?:,|$)

Demo here

Sample program

import re

inputs=["RB2 F 27/0/31 0/32, R8 28/31/120 2/0/2",
"TYPE :ABC,NAME: AB.DE,DESC:10/10/5:2,COMMENT: , ID:123",
"ID:123, NAME:CDE,DESC:10-10/5:2"]

output=[]
for line in inputs:
    output.append([a+'='+b for a,b in re.findall("([a-zA-Z ]+):(.*?)(?:,|$)",line)] or line)
    
print(output)

Output

[
 'RB2 F 27/0/31 0/32, R8 28/31/120 2/0/2',
 ['TYPE =ABC', 'NAME= AB.DE', 'DESC=10/10/5:2', 'COMMENT= ', ' ID=123'],
 ['ID=123', ' NAME=CDE', 'DESC=10-10/5:2']
]
Sign up to request clarification or add additional context in comments.

Comments

0

You don't need regex for this if it's okay to assume that there won't be a ':' in the key.

if ':' in string:
    string = string.replace(' ', '')
    string = string.split(',')
    for (i, word) in enumerate(string):
        string[i] = word.replace(':', '=', 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.