Python Parse text

Question

Here is a text, I need to parse;

JAVA_OPTS=blablalba
lbalbalba

1. main1:

     aelo1 2020-06-15 11 4422
     sddg2 2020-06-12 19 422

2. main2:

     fdata3 2020-06-15 11 4422
     gcontent4 2020-06-12 19 422

3. main3:

     hxvnt5 2020-06-15 11 4422
     vcfdet6 2020-06-12 19 422

I need to only parse the numbered bullet point, until next bullet point. and find the 4 th column greater than 1000 and older than 12 hours (2nd column date time) then send the details in email. I tried parsing via re library in python, but cannot achieve it.

So the expected output is;

    1. main1:

         aelo1 2020-06-15 11 4422

    2. main2:

         fdata3 2020-06-15 11 4422

    3. main3:

         hxvnt5 2020-06-15 11 4422

is it possible via bash or python ?

The "older than 12 hours" requirement needs clarification - do you want to keep the rows with 3rd column values > 12 or ignore them? Also, sharing what you have tried will help others help you. — Omkar Neogi
– Omkar Neogi, Commented Jun 17, 2020 at 4:34

Akshay G Bhardwaj · Accepted Answer · 2020-06-17 04:20:14Z

1

Here is the regex which you can use to match (I am not sure about 12 hours).

\d+\.\s\S+\s+\S+\s[0-9-]+\s\d+\s[1-9][0-9]{3,}

answered Jun 17, 2020 at 4:20

Akshay G Bhardwaj

3391 gold badge4 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jason Yang · Accepted Answer · 2020-06-17 06:56:41Z

0

Here a solution for you

def parsing(text):
    if text.strip() == '':
        return ''
    lines = text.split('\n')
    buffer = ''
    for line in lines:
        t = line.strip()
        if t == '' or t[0] in '0123456789':
            buffer += line + '\n'
        else:
            lst = t.split()
            if len(lst) >= 4:
                if (len(lst[1].split('-'))==3 and int(lst[2]) <= 12 and
                        int(lst[3]) > 1000):
                    buffer += line + '\n'
    return buffer.strip()

print(parsing(text))

edited Jun 17, 2020 at 6:56

answered Jun 17, 2020 at 5:41

Jason Yang

13.1k2 gold badges11 silver badges29 bronze badges

2 Comments

Jason Yang Over a year ago

That's why requested to provide more information about their requirements and situations. Updated.

user13760031 Over a year ago

Updated the expected output for better understanding

apraksim · Accepted Answer · 2021-12-30 12:38:30Z

Can use TTP to parse/filter it in one template:

from ttp import ttp
import pprint

data = """
JAVA_OPTS=blablalba
lbalbalba

1. main1:

     aelo1 2020-06-15 11 4001
     sddg2 2020-06-12 19 422

2. main2:

     fdata3 2020-06-16 11 4422
     gcontent4 2020-06-12 19 422

3. main3:

     hxvnt5 2020-06-17 11 4002
     vcfdet6 2020-06-12 19 422
"""
    
template = """
<group contains="value">
1. main1: {{ _start_ }}
     {{ ignore }} {{ date }} {{ hour | lessthan("12") }} {{ value | greaterthan("4000") }}
</group>     
"""
    
parser = ttp(data, template)
parser.parse()
res = parser.result()
pprint.pprint(res)

# prints:
# [[[{'date': '2020-06-15', 'hour': '11', 'value': '4001'},
#    {'date': '2020-06-16', 'hour': '11', 'value': '4422'},
#    {'date': '2020-06-17', 'hour': '11', 'value': '4002'}]]]

Can test templates online here if you'd like.

Disclaimer: I am the author of TTP.

Edit: after parsing can further post-process results to compose email report or whatever the end result must look like.

Collectives™ on Stack Overflow

Python Parse text

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related