Your python version is rather inefficient because you're testing for membership in a list, rather than a set or a dict (i.e. O(n) lookup time instead of O(1)).
Try using a set of tuples or a set of strings instead. Tuples would be a better choice as the two files could be split on different delimiters, but I don't think you'll see a particularly large performance difference. tuple('something'.split()) is relatively fast compared to testing for the membership of a very long list.
Also, there's no need to call inp.readlines(). In other words, you could just do
look_up = set(tuple(line.split()) for line in inp)
And you should see a significant speedup without having to change any other parts of your code other than tuple(line[:3]) rather than [line[0], line[1], line[2]].
Actually, grep and bash are pretty perfect for this... (Untested, but it should work.)
while read line
do
grep "$line" "file2.txt"
done < "file1.txt"
To see which one is faster, we can generate some test data (~4500 keys in file1.txt and 1000000 lines in file2.txt), and benchmark a simple python version of same thing (Roughly... The lines will be printed in a different order than the grep version.).
with open('file1.txt', 'r') as keyfile:
lookup = set(tuple(line.split()) for line in keyfile)
with open('file2.txt', 'r') as datafile:
for line in datafile:
if tuple(line.split()[:3]) in lookup:
print line,
The python version turns out to be ~70x faster:
jofer@cornbread:~/so> time sh so_temp149.sh > a
real 1m47.617s
user 0m51.199s
sys 0m54.391s
vs.
jofer@cornbread:~/so> time python so_temp149.py > b
real 0m1.631s
user 0m1.558s
sys 0m0.071s
Of course, the two examples are approaching the problem in entirely different ways. We're really comparing two algorithms, not two implementations. For example, if we only have a couple of key lines in file1, the bash/grep solution easily wins.
(Does bash have a builtin container of some sort with O(1) lookup for membership? (I think bash 4 might have a hash table, but I don't know anything about it...) It would be interesting to try implementing a similar algorithm to the python example above in bash, as well...)