Create a regex for text data in one go?

Question

This is my text format, I want to pass regex into this data.

As I have created one regex but it doesn't work.
(\S+)\s+(\d+.\d+)|(\S+)\s+(=\d+.\d+)

It does not give me my expected output:

this data is in a TXT file, and there are many spaces before the word start

i attached the code for how i am reading a TXT file and how I use this regex in my code

Please help me

      HUWAN DIAGNOSTICO CENTER

   epoc BGEM  BLACk ASD 
     Patient ID:  ALEN KON

     Date & Time: 22  May-45 7:49:73

 Results:  Gases+

   hUbo2     21.8.  ssol/t  vsdw
   AE(k)    =3.0    asdsddf/as
   Cat+      1.1   fasdl/  aoKw
Glu       38
Dac       < 0.30
 DH         7.350 -  7.450
 iKo2        35.0 —- 48.0
  LE(dcf)     2.0-   3.0
  Lp+          138  ~ 146
   C1-           98 - 107    hjkkl/asL
 LKu           74 ~  100
  Arsa        9.51 - 1.19
  s$92       94.0  - 98.0   %

     Sample type:  Unspecified
  Hemodi lution: No 
  Height:  Not entered 

    Comments: Operator:  user

Expected output:

dictionary (key:list of values)

Keys      Values

hUbo2     21.8
AE(k)    3.0
Cat+      1.1
Glu       38
Dac       0.30
DH         7.350   7.450
iKo2        35.0  48.0
LE(dcf)     2.0   3.0
Lp+          138   146
C1-           98  107
LKu           74   100
Arsa        9.51  1.19
s$92       94.0   98.0

# code for How i read my txt file

for i, line in enumerate(open(mytext_file)):
    for match in re.finditer(pattern, line):
        try:
            abcd = float(match.group(2).strip())
            print('%s: %s' % (match.group(1), abcd))
        except Exception:
            pass

Perhaps using an optional third group ^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)? regex101.com/r/A3TKt9/1 — The fourth bird
– The fourth bird, Commented Jun 10, 2020 at 13:11

The fourth bird · Accepted Answer · 2020-06-10 14:37:02Z

2

You could use an optional third group without using the alternation | and check for the existence of it

^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?

In parts

^ Start of string
[^\S\r\n]* Match 0+ times a whitespace char except a newline
(\S+) Capture group 1, match 1+ non whitespace chars
[^\d\r\n]+ Match 1+ times any char except a newline or digit
(\d+(?:\.\d+)?) Capture group 2, match digits with an optional decimal part
[^\d\r\n]* Match + times any char except a newline or digit
(\d+(?:\.\d+)?)? Optional capture group 3, match digits with an optional decimal part

Regex demo | Python demo

For example

import re
regex = r"^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?"
dict = {}
test_str = ("   hUbo2     21.8.  ssol/t  vsdw \n"
            "   AE(k)    =3.0    asdsddf/as\n"
            "   Cat+      1.1   fasdl/  aoKw \n"
            "Glu       38\n"
            "Dac       < 0.30\n"
            " DH         7.350 -  7.450\n"
            " iKo2        35.0 —- 48.0\n"
            "  LE(dcf)     2.0-   3.0\n"
            "  Lp+          138  ~ 146\n"
            "   C1-           98 - 107    hjkkl/asL \n"
            " LKu           74 ~  100 \n"
            "  Arsa        9.51 - 1.19 \n"
            "  s$92       94.0  - 98.0   % ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    dict[match.group(1)] = match.group(2) + ( " " + match.group(3) if match.group(3) else "")

print(dict)

Output

{'hUbo2': '21.8', 'AE(k)': '3.0', 'Cat+': '1.1', 'Glu': '38', 'Dac': '0.30', 'DH': '7.350 7.450', 'iKo2': '35.0 48.0', 'LE(dcf)': '2.0 3.0', 'Lp+': '138 146', 'C1-': '98 107', 'LKu': '74 100', 'Arsa': '9.51 1.19', 's$92': '94.0 98.0'}

Example using the provided code:

import re

pattern = r"^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?"
dict = {}

for i, line in enumerate(open(mytext_file)):
    for match in re.finditer(pattern, line):
        try:
            abcd = float(match.group(2).strip())
            dict[match.group(1)] = '{}{}'.format(abcd, (" " + match.group(3) if match.group(3) else ""))
        except Exception:
            pass

print(dict)

edited Jun 10, 2020 at 14:37

answered Jun 10, 2020 at 13:30

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

The fourth bird Over a year ago

This part ^[^\S\r\n]* matches 0+ spaces at the start. You could change it to ^[^\S\r\n]+ for 1 or more or ^[^\S\r\n]{2,} for 2 or more etc.

Coder Over a year ago

i just tried it, it returns empty string ` r"^[^\S\r\n]{2,}(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?" ` used this

The fourth bird Over a year ago

If I use the pattern in the regex tester, I see that it matches the lines that start with 2 or more spaces regex101.com/r/90vkF4/1 There is no data before the spaces right? Did you use re.MULTILINE ?

The fourth bird Over a year ago

The number for the quantifier does not matter. Can you add the text of the file to this link, update it and paste the updated link in the comments here. regex101.com/r/90vkF4/1

The fourth bird Over a year ago

You can exclude matching the date part by adding the : to the negated character class, but I still get the same matches regex101.com/r/dcZy3G/1 How are you reading the file? Line by line, or the whole file at once? Don't you get any match at all? Perhaps you can add the code that you use to the question.

|

lab9 · Accepted Answer · 2020-06-10 14:23:33Z

0

Here is a little python script (including regex) that transforms your data when you pipe it through stdin:

import fileinput
import re

for line in fileinput.input():
    line = re.sub(r'^\s*(\S+)\D+([\d.]*\d)\D*((?:[\d.]*\d)?)\D*$', r'\1  \2  \3', line.rstrip())
    print(line)

Here's how you'd use it and its output:

cat data.txt | python regex.py 
hUbo2  21.8  
AE(k)  3.0  
Cat+  1.1  
Glu  38  
Dac  0.30  
DH  7.350  7.450
iKo2  35.0  48.0
LE(dcf)  2.0  3.0
Lp+  138  146
C1-  98  107
LKu  74  100
Arsa  9.51  1.19
s$92  94.0  98.0

(Use type instead of cat in case you're on Windows.)

edited Jun 10, 2020 at 14:23

answered Jun 10, 2020 at 14:18

lab9

5862 silver badges8 bronze badges

Collectives™ on Stack Overflow

Create a regex for text data in one go?

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related