For a successful request, log file has been written as below
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
Line beginning with o and has some data (as shown above) and ends with another line beginning with o. This is the pattern.
If there are multiple such requests, log file keeps on appending as shown below
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
There might be some wrong generated data in the file, that can be for any earlier requests or latest requests.
If wrong-data generated is not for latest request, can be ignored.
Example :
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# Should be indication of request i.e., line beginning with o, followed some data
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
If wrong-data generated is for latest request, it should be caught and highlighted.
Example :
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1 37451
n G-809573 1452830400 1 92673
# No line present i,e., (o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd)
Missing line can be first line beginning with o or last line beginning with o
I need to check, for every request logs are written in this format and how many successful and un-successful requests are captured in a file ?
Approach 1: can read the file contents and then parse as if line beigns with o etc, which i don't feel feasible
Approach 2 : I feel, reg-ex is optimum and best solution.
Which would be best ? and could you please help me to achieve it ?
Tried so far:
reg_ex1 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+-"
reg_ex2 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+[a-zA-Z0-9_]+"
with open(""some_file.log, 'r') as content_file:
content = content_file.read()
pattern1 = re.compile(reg_ex1)
begin_lines = len(pattern1.findall(content))
pattern2 = re.compile(reg_ex2)
end_lines = len(pattern2.findall(content))
if begin_lines == end_lines:
print "File has successful requests captured"
else:
print "File has un-successful requests captured"
# If wrong-data generated is not for latest request, can be ignored.
# If wrong-data generated is for latest request, it should be caught and highlighted.
May be not a good idea though, please let me know.
UPD:
o 123456789.000 10.10.10.10 3 30 10 001-
n A-123456===123 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1 37451
n G-809573==123 1452830400 1 926731
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 002-
n A-123456===456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1 37452
n G-809573===456 1452830400 1 926732
o 123456789.000 10.10.10.10 3 30 10 003-
n A-123456===789 1452830400 1 14523
n C-73652 1452830400 1 231543
n B-967845 1452830400 1 374513
n G-809573===789 1452830400 1 926733
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
For the above text, I would want to extract Packet 1 and 3.

ootherwise its a bad file.