1

This is my text format, I want to pass regex into this data.

As I have created one regex but it doesn't work.
(\S+)\s+(\d+.\d+)|(\S+)\s+(=\d+.\d+)

It does not give me my expected output:

this data is in a TXT file, and there are many spaces before the word start

i attached the code for how i am reading a TXT file and how I use this regex in my code

Please help me

      HUWAN DIAGNOSTICO CENTER

   epoc BGEM  BLACk ASD 
     Patient ID:  ALEN KON

     Date & Time: 22  May-45 7:49:73

 Results:  Gases+

   hUbo2     21.8.  ssol/t  vsdw
   AE(k)    =3.0    asdsddf/as
   Cat+      1.1   fasdl/  aoKw
Glu       38
Dac       < 0.30
 DH         7.350 -  7.450
 iKo2        35.0 —- 48.0
  LE(dcf)     2.0-   3.0
  Lp+          138  ~ 146
   C1-           98 - 107    hjkkl/asL
 LKu           74 ~  100
  Arsa        9.51 - 1.19
  s$92       94.0  - 98.0   %

     Sample type:  Unspecified
  Hemodi lution: No 
  Height:  Not entered 

    Comments: Operator:  user

Expected output:

dictionary (key:list of values)

Keys      Values

hUbo2     21.8
AE(k)    3.0
Cat+      1.1
Glu       38
Dac       0.30
DH         7.350   7.450
iKo2        35.0  48.0
LE(dcf)     2.0   3.0
Lp+          138   146
C1-           98  107
LKu           74   100
Arsa        9.51  1.19
s$92       94.0   98.0
# code for How i read my txt file

for i, line in enumerate(open(mytext_file)):
    for match in re.finditer(pattern, line):
        try:
            abcd = float(match.group(2).strip())
            print('%s: %s' % (match.group(1), abcd))
        except Exception:
            pass
1
  • Perhaps using an optional third group ^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)? regex101.com/r/A3TKt9/1 Commented Jun 10, 2020 at 13:11

2 Answers 2

2

You could use an optional third group without using the alternation | and check for the existence of it

^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?

In parts

  • ^ Start of string
  • [^\S\r\n]* Match 0+ times a whitespace char except a newline
  • (\S+) Capture group 1, match 1+ non whitespace chars
  • [^\d\r\n]+ Match 1+ times any char except a newline or digit
  • (\d+(?:\.\d+)?) Capture group 2, match digits with an optional decimal part
  • [^\d\r\n]* Match + times any char except a newline or digit
  • (\d+(?:\.\d+)?)? Optional capture group 3, match digits with an optional decimal part

Regex demo | Python demo

For example

import re
regex = r"^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?"
dict = {}
test_str = ("   hUbo2     21.8.  ssol/t  vsdw \n"
            "   AE(k)    =3.0    asdsddf/as\n"
            "   Cat+      1.1   fasdl/  aoKw \n"
            "Glu       38\n"
            "Dac       < 0.30\n"
            " DH         7.350 -  7.450\n"
            " iKo2        35.0 —- 48.0\n"
            "  LE(dcf)     2.0-   3.0\n"
            "  Lp+          138  ~ 146\n"
            "   C1-           98 - 107    hjkkl/asL \n"
            " LKu           74 ~  100 \n"
            "  Arsa        9.51 - 1.19 \n"
            "  s$92       94.0  - 98.0   % ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    dict[match.group(1)] = match.group(2) + ( " " + match.group(3) if match.group(3) else "")

print(dict)

Output

{'hUbo2': '21.8', 'AE(k)': '3.0', 'Cat+': '1.1', 'Glu': '38', 'Dac': '0.30', 'DH': '7.350 7.450', 'iKo2': '35.0 48.0', 'LE(dcf)': '2.0 3.0', 'Lp+': '138 146', 'C1-': '98 107', 'LKu': '74 100', 'Arsa': '9.51 1.19', 's$92': '94.0 98.0'}

Example using the provided code:

import re

pattern = r"^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?"
dict = {}

for i, line in enumerate(open(mytext_file)):
    for match in re.finditer(pattern, line):
        try:
            abcd = float(match.group(2).strip())
            dict[match.group(1)] = '{}{}'.format(abcd, (" " + match.group(3) if match.group(3) else ""))
        except Exception:
            pass

print(dict)
Sign up to request clarification or add additional context in comments.

6 Comments

This part ^[^\S\r\n]* matches 0+ spaces at the start. You could change it to ^[^\S\r\n]+ for 1 or more or ^[^\S\r\n]{2,} for 2 or more etc.
i just tried it, it returns empty string ` r"^[^\S\r\n]{2,}(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?" ` used this
If I use the pattern in the regex tester, I see that it matches the lines that start with 2 or more spaces regex101.com/r/90vkF4/1 There is no data before the spaces right? Did you use re.MULTILINE ?
The number for the quantifier does not matter. Can you add the text of the file to this link, update it and paste the updated link in the comments here. regex101.com/r/90vkF4/1
You can exclude matching the date part by adding the : to the negated character class, but I still get the same matches regex101.com/r/dcZy3G/1 How are you reading the file? Line by line, or the whole file at once? Don't you get any match at all? Perhaps you can add the code that you use to the question.
|
0

Here is a little python script (including regex) that transforms your data when you pipe it through stdin:

import fileinput
import re

for line in fileinput.input():
    line = re.sub(r'^\s*(\S+)\D+([\d.]*\d)\D*((?:[\d.]*\d)?)\D*$', r'\1  \2  \3', line.rstrip())
    print(line)

Here's how you'd use it and its output:

cat data.txt | python regex.py 
hUbo2  21.8  
AE(k)  3.0  
Cat+  1.1  
Glu  38  
Dac  0.30  
DH  7.350  7.450
iKo2  35.0  48.0
LE(dcf)  2.0  3.0
Lp+  138  146
C1-  98  107
LKu  74  100
Arsa  9.51  1.19
s$92  94.0  98.0

(Use type instead of cat in case you're on Windows.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.