I'm new to scripting and have been reading up on Python for about 6 weeks. The below is meant to read a log file and send an alert if one of the keywords defined in srchstring is found. It works as expected and doesn't alert on strings previously found, as expected. However the file its processing is actively being written to by an application and the script is too slow on files around 500mb. under 200mb it works fine ie within 20secs.
Could someone suggest a more efficient way to search for a string within a file based on a pre-defined list?
import os
srchstring = ["Shutdown", "Disconnecting", "Stopping Event Thread"]
if os.path.isfile(r"\\server\\share\\logfile.txt"):
with open(r"\\server\\share\\logfile.txt","r") as F:
for line in F:
for st in srchstring:
if st in line:
print line,
#do some slicing of string to get dd/mm/yy hh:mm:ss:ms
# then create a marker file called file_dd/mm/yy hh:mm:ss:ms
if os.path.isfile("file_dd/mm/yy hh:mm:ss:ms"): # check if a file already exists named file_dd/mm/yy hh:mm:ss:ms
print "string previously found- ignoring, continuing search" # marker file exists
else:
open("file_dd/mm/yy hh:mm:ss:ms", 'a') # create file_dd/mm/yy hh:mm:ss:ms
print "error string found--creating marker file sending email alert" # no marker file, create it then send email
else:
print "file not exist"
F? I assume is the file you are reading, but the code doesn't reflect that. Also, when you open a file to write you don't close it. The pythonic way to writing to files is using context:with open('filename') as f: .... To your question, I would try usinsetinstead of alistforsrchstring. Then, for each line in the file, make asetof the words in the line (e.g.linset = set(line.split(' '))) and the use set intersection (see docs.python.org/2/library/sets.html). If it's not empty, then there's a match. I'm guessing this could speed up thingsinsearch? It could be that you are reading the whole file into memory, but you don't show that code. Where doesFcome from?with openshould be indented and everything under it up until theelseshould be reindented correspondingly, but as code edits are discouraged, I merely point that out here. In other words, we can probably guess what you mean, but posting Python code with different indentation than you have locally is extremely bad form.