I'm new to Stackoverflow and relatively new to Python. I Have tried searching the site for an answer to this question, but haven't found one related to matching values between csv and txt files.
I'm writing a simple Python script that reads in a row from large csv file (~600k lines), grabs a value from that row, assigns to a variable, then uses the variable to try to find a matching value from a large txt file (~1.8MM lines). It's not working and I'm not sure why.
Here's a snippet from the source.csv file:
DocNo,Title,DOI
1,"Title One",10.1080/02724634.2016.1269539
2,"Title Two",10.1002/2015ja021888
3,"Title Three",10.1016/j.palaeo.2016.09.019
Here's a snippet from the lookup.txt file (note that it's separated by \t):
DOI 10.1016/j.palaeo.2016.09.019 M First
DOI 10.1016/j.radmeas.2015.12.002 M First
DOI 10.1097/SCS.0000000000002859 M First
Here's the offending code:
import csv
with open('source.csv', newline='', encoding = "ISO-8859-1") as f, open('lookup.txt', 'r') as i:
reader = csv.reader(f, dialect='excel')
counter = 0
for line in i:
for row in reader:
doi = row[2]
doi = str(doi) # I think this might actually be redundant...
if doi in line:
# This will eventually do more interesting things, but right now it's just a test
print(doi)
break
else:
# This will be removed--is also just a test (so I can watch progress)
print(counter)
counter += 1
Currently, when it runs, it just counts the lines, even though there's a matching doi in each file.
The maddening thing is that when I give doi a hard-coded value, it executes as it should. This makes me think that either the slashes in doi are breaking things somehow, or I need to convert the data type of the doi variable.
For example, this works:
doi = "10.1016/j.palaeo.2016.09.019"
for line in i:
if doi in line:
print(doi)
break
else:
print(counter)
counter += 1
Thanks in advance for your help, I'm at my wit's end!