2

Here is a text, I need to parse;

JAVA_OPTS=blablalba
lbalbalba

1. main1:

     aelo1 2020-06-15 11 4422
     sddg2 2020-06-12 19 422

2. main2:

     fdata3 2020-06-15 11 4422
     gcontent4 2020-06-12 19 422

3. main3:

     hxvnt5 2020-06-15 11 4422
     vcfdet6 2020-06-12 19 422

I need to only parse the numbered bullet point, until next bullet point. and find the 4 th column greater than 1000 and older than 12 hours (2nd column date time) then send the details in email. I tried parsing via re library in python, but cannot achieve it.

So the expected output is;

    1. main1:

         aelo1 2020-06-15 11 4422

    2. main2:

         fdata3 2020-06-15 11 4422

    3. main3:

         hxvnt5 2020-06-15 11 4422

is it possible via bash or python ?

3
  • What do you mean "older than 12h"? Commented Jun 17, 2020 at 4:16
  • The "older than 12 hours" requirement needs clarification - do you want to keep the rows with 3rd column values > 12 or ignore them? Also, sharing what you have tried will help others help you. Commented Jun 17, 2020 at 4:34
  • Add parse to you post tag Commented Jun 17, 2020 at 5:55

3 Answers 3

1

Here is the regex which you can use to match (I am not sure about 12 hours).

\d+\.\s\S+\s+\S+\s[0-9-]+\s\d+\s[1-9][0-9]{3,}
Sign up to request clarification or add additional context in comments.

Comments

0

Here a solution for you

def parsing(text):
    if text.strip() == '':
        return ''
    lines = text.split('\n')
    buffer = ''
    for line in lines:
        t = line.strip()
        if t == '' or t[0] in '0123456789':
            buffer += line + '\n'
        else:
            lst = t.split()
            if len(lst) >= 4:
                if (len(lst[1].split('-'))==3 and int(lst[2]) <= 12 and
                        int(lst[3]) > 1000):
                    buffer += line + '\n'
    return buffer.strip()

print(parsing(text))

2 Comments

That's why requested to provide more information about their requirements and situations. Updated.
Updated the expected output for better understanding
0

Can use TTP to parse/filter it in one template:

from ttp import ttp
import pprint

data = """
JAVA_OPTS=blablalba
lbalbalba

1. main1:

     aelo1 2020-06-15 11 4001
     sddg2 2020-06-12 19 422

2. main2:

     fdata3 2020-06-16 11 4422
     gcontent4 2020-06-12 19 422

3. main3:

     hxvnt5 2020-06-17 11 4002
     vcfdet6 2020-06-12 19 422
"""
    
template = """
<group contains="value">
1. main1: {{ _start_ }}
     {{ ignore }} {{ date }} {{ hour | lessthan("12") }} {{ value | greaterthan("4000") }}
</group>     
"""
    
parser = ttp(data, template)
parser.parse()
res = parser.result()
pprint.pprint(res)

# prints:
# [[[{'date': '2020-06-15', 'hour': '11', 'value': '4001'},
#    {'date': '2020-06-16', 'hour': '11', 'value': '4422'},
#    {'date': '2020-06-17', 'hour': '11', 'value': '4002'}]]]

Can test templates online here if you'd like.

Disclaimer: I am the author of TTP.

Edit: after parsing can further post-process results to compose email report or whatever the end result must look like.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.