3

Name        Miss deks KUMARI                    Booking Date           22/05/2020 
             Gender/Age  male  24 Yrs                        Reporting Date         22/05/2020 
             Lab No.     10203693                              Sample Collected At    Lab 
             Ref. By Dr. I.C.U 
                  ;                                                                          UVLO 
             Test Name                                  Value         Unit            Biological Ref Interval 
                                           COMPLETE   BLOOD   COUNT (CBC) 
             TOTAL LEUCOCYTES    COUNT (TLC)            23160         cells/cmm       4000 - 11000 
             DIFFERENTIAL LEUCOCYTES  COUNT (DLC) 
             NEUTROPHILS                                93.4          %               45.0 - 65.0 
             LYMPHOCYTES                                 3.3          %               20.0 - 45.0 
             MONOCYTES                                   3.1          %               4.0 - 10.0 
             EOSINOPHILS                                0.2           %               0.0 - 5.0 
             BASOPHILS                                   0.0          %               0.0-1.0 
             ABSOLUTE   NEUTROPHILS                      21620.0                      3000.0 - 7000.0 
             ABSOLUTE   LYMPHOCYTES                      750.0                        800.0 - 4000.0 
             ABSOLUTE  MONOCYTES                         730.0                        0.0 - 1200.0 
             ABSOLUTE  EOSINOPHILS                       50.0                         0.0 - 500.0 
             ABSOLUTE  BASOPHILS                         10.0                         0.0 - 100.0 
             RBC  COUNT                                  4.31         Millions/cmm    3.80 - 5.80 

this is a text file and i want to this kind of output using regex

if i search NEUTROPHILS i want it's value 93.4

if i search BASOPHILS i want it's value 0.0, something like that.

only first two columns needed, i tried to implement once regex ^[^\S\r\n]*(\S+)[^\d\r\n]+(\d+(?:\.\d+)?)[^\d\r\n]*(\d+(?:\.\d+)?)?

but it returns all

someone please help me to get this

here is my list

         `["NEUTROPHILS"                                
         "LYMPHOCYTES"                               
         "MONOCYTES"                                   
         "EOSINOPHILS"                               
         "BASOPHILS"]`             

i want to get like this-:

{
 "NEUTROPHILS"  :  93.4                            
 "LYMPHOCYTES"  :  3.3                           
 "MONOCYTES"    :  3.1                             
 "EOSINOPHILS"  :  0.2                         
 "BASOPHILS"    :  0.0 }



1
  • There are a number of ways to do this. What I've done in the past is go through the file line by line, regex find the actual line (if you use regex search, make sure to use the .string output to get the entire line), use .split() on the string, then index the value you want to extract. Commented Jul 13, 2020 at 17:43

3 Answers 3

3

You could use the following expression:

\b(?P<key>[A-Z][A-Z ]+)\b(?P<value>\d+(?:\.\d+)?)

Then, we need to clean the keys (remove unnecessary whitespaces) and think of a function, that returns the value for a given key. Optional: put it all in a class. That said, the code could be:

import re

class Finder:
    def __init__(self, haystack):
        self.db = self.build_db(haystack)

    def build_db(self, haystack):
        rx = re.compile(r'\b(?P<key>[A-Z][A-Z ]+)\b(?P<value>\d+(?:\.\d+)?)')
        ws = re.compile(r'\s+')

        return {ws.sub(' ', m["key"].strip()): m["value"] for m in rx.finditer(haystack)}

    def find_by_key(self, key):
        try:
            value = self.db[key]
        except KeyError:
            value = None
        return value

    def get_selected(self, lst):
        result = {}
        for key in lst:
            value = self.find_by_key(key)
            if value:
                result[key] = value
        return result

    def get_all(self):
        return self.db

cls = Finder(junk)
dct = cls.get_selected(["NEUTROPHILS", "LYMPHOCYTES", "MONOCYTES", "EOSINOPHILS", "BASOPHILS"])
print(dct)

Which would yield

{'NEUTROPHILS': '93.4', 'LYMPHOCYTES': '3.3', 
 'MONOCYTES': '3.1', 'EOSINOPHILS': '0.2', 'BASOPHILS': '0.0'}

See a demo for the expression on regex101.com.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for this, i just updated my question, can you please help me through that
@Jose: See the updated answer and Finder.get_selected().
can you please say , how to pass a lines of that .TXT file into Junk???
1

You can try this simple regex for that. Your 1st column would be the 0th capture group and the 2nd column would be the 1st capture group: [A-Z]+\s+[A-Z]*\s+(\d+.\d*)

Explanation of above regex:

  • It first matches one or more uppercase letters
  • Then matches one or more spaces
  • Then again matches zero or mode uppercase letters (to cover space separated keys in your text)
  • The last part matches decimal digit(s).

Here is the demo on regex101.com

Note: This regex can be easily improved to add more restrictions.

2 Comments

Change the expression from ...(\d+.\d*)to (\d+(?:\.\d+)?) otherwise things like 123f, 1232323232?, 222!222, etc. are considered valid.
@Jan Yes you are correct, that could match this regex but I've just given a high level hint (instead of spoon-feeding) and also included that you can add restrictions in this regex to exclude such cases. That should be tried and done by OP himself.
-1

I'm sure there are better ways to do this. But this is what I've done in the past:

with open(file.txt) as file: 
  for line in file:
    remove_white_spaces=line.strip()
    search=re.search('^\w+\s+\d+',remove_white_spaces)
    if search != None: 
      extract=(search.string).split()
      print(extract[1])

Granted you can change the search to the actual word if you'd like. I've written this fully out, however with list comprehension you could write this entire thing into 2 lines.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.