I want to find the matching email in two files and sent date by comparing emails from two files. I have two files 1) maillog.txt(postfix maillog) and 2)testmail.txt(contains emails separated by newline) i have used re to extract the email and sent date from maillog.txt file which looks like below,
Nov 3 10:08:43 server postfix/smtp[150754]: 78FA8209EDEF: to=<[email protected]>, relay=aspmx.l.google.com[74.125.24.26]:25, delay=3.2, delays=0.1/0/1.6/1.5, dsn=2.0.0, status=sent (250 2.0.0 OK 1509718076 m11si5060862pls.447 - gsmtp)
Nov 3 10:10:45 server postfix/smtp[150754]: 7C42A209EDEF: to=<[email protected]>, relay=mxa-000f9e01.gslb.pphosted.com[67.231.152.217]:25, delay=5.4, delays=0.1/0/3.8/1.5, dsn=2.0.0, status=sent (250 2.0.0 2dvkvt5tgc-1 Message accepted for delivery)
Nov 3 10:15:45 server postfix/smtp[150754]: 83533209EDE8: to=<[email protected]>, relay=mxa-000f9e01.gslb.pphosted.com[67.231.144.222]:25, delay=4.8, delays=0.1/0/3.3/1.5, dsn=2.0.0, status=sent (250 2.0.0 2dvm8yww64-1 Message accepted for delivery)
Nov 3 10:16:42 server postfix/smtp[150754]: 83A5E209EDEF: to=<[email protected]>, relay=aspmx.l.google.com[74.125.200.27]:25, delay=1.6, delays=0.1/0/0.82/0.69, dsn=2.0.0, status=sent (250 2.0.0 OK 1509718555 j186si6198120pgc.455 - gsmtp)
Nov 3 10:17:44 server postfix/smtp[150754]: 8636D209EDEF: to=<[email protected]>, relay=mxa-000f9e01.gslb.pphosted.com[67.231.144.222]:25, delay=4.1, delays=0.11/0/2.6/1.4, dsn=2.0.0, status=sent (250 2.0.0 2dvm8ywwdh-1 Message accepted for delivery)
Nov 3 10:18:42 server postfix/smtp[150754]: 8A014209EDEF: to=<[email protected]>, relay=aspmx.l.google.com[74.125.200.27]:25, delay=1.9, delays=0.1/0/0.72/1.1, dsn=2.0.0, status=sent (250 2.0.0 OK 1509718675 o2si6032950pgp.46 - gsmtp)
Here is my another file testmail.txt :
[email protected]
[email protected]
Below is what i have tried and it works too but I want to know if there is more efficient way to do this for large number of maillogs and email addresses
import re
pattern=r'(?P<month>[A-Za-z]{3})\s{1,3}(?P<day>\d{1,2})\s{1,2}(?P<ts>\d+:\d+:\d+).*to=<(?P<email>([\w\.-]+)@([\w\.-]+))'
with open("testmail.txt") as fh1:
for addr in fh1:
if addr:
with open("maillog.txt") as fh:
for line in fh:
if line:
match=re.finditer(pattern,line)
for obj in match:
addr=addr.strip()
addr2=obj.group('email').strip()
if addr == addr2:
print(obj.groupdict('email'))
this will print out put like if match is found:
{'month': 'Nov', 'day': '3', 'ts': '10:08:43', 'email': '[email protected]'}