-1

I try to compare two csv files. First file (movements.csv) has 14 columns, second csv (LCC.csv) one single column. I want to check whether the entries (strings) of column 8 in movements.csv appear somewhere in column 1 of LCC.csv. If so, in column 14 a 'Yes' should be written, if not a 'No'. The code I tried so far is and the error message I receive:


import csv

f1 = file('LCC.csv', 'rb') 
f2 = file('movements.csv', 'rb')
f3 = ('output.csv', 'wb') 

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

movements = list(c2)

for LCC_row in c1:
    row = 0
    found = False
    for movements_row in movements:
        output_row = movements_row
        if movements_row[7] == LCC_row[0]
            output_row.append('Yes')
            found = True
            break
        row += 1
    if not found:
        output_row.append('No')
    c3.writerow(output_row)

f1.close()
f2.close()
f3.close()

enter image description here

I'm a complete beginner with python, so any advice is appreciated! Optimally the check between the two columns would also disregard whether the strings are written in capital letters or not.

The error message comes after

c3.writerow(output_row)

as

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
>>> 

LCC.csv (no header):

Air Ab  
Jamb  
Sw  
AIRF  
EURO   

movements.csv (has a header):

ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC  
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu,   
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A,   
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu,   
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,   

as already said, the last column (LCC) is completely empty at the moment

7
  • 1
    What does "is not working" mean? Commented Dec 19, 2016 at 13:51
  • I receive an error message after if movements_row[7] == LCC_row[0], namely: File "<stdin>", line 6 if movements_row[7] == LCC_row[0] ^ SyntaxError: invalid syntax Commented Dec 19, 2016 at 13:54
  • Please edit your question with the error message. And clearly mark which line causes it. Commented Dec 19, 2016 at 13:55
  • @AnnaStünzi: would it be ok to use pandas to solve this problem ?? Commented Dec 19, 2016 at 14:03
  • You just missed a ' : ' after your if statement Commented Dec 19, 2016 at 14:03

3 Answers 3

1

It has many issues. Few which I found after glancing at the code are:

  1. You having invalid quote ' in your line:

    f2 = file('movements.csv', ,rb')
    #                          ^
    

    It should be:

    f2 = file('movements.csv', 'rb')
    
  2. In the code you shared you are having ` back quote at various places instead of single quote '. For example, your lines should be:

    f1 = file('LCC.csv', 'rb') 
    f3 = file('output.csv', 'wb')    
    #     ^ also missing file here
    
  3. Missing colon : after if. It should be:

    if movements_row[7] == LCC_row[0]:
    #                           Here ^
    

Also, for initializing the string, you do not need parenthesis. Just assign it like:

output_row[13] = 'Yes'
#                ^ As simple string
Sign up to request clarification or add additional context in comments.

8 Comments

` is called a back quote or back tick.
Yes sorry, this is a copy paste error here in the forum, in the code I use I have the ' everywhere, I just checked again
There is still some mistakes
@iFlo: Yes, thats what even I find out. Every time I see code, I find few. And I am not checking for logical errors
@MoinuddinQuadri SyntaxError almost always means missing or incorrect punctuation. I commonly get this when I forget colons and closing parentheses. To track down the problem, start at the line indicated in the error message and work backwards.
|
0

There are quite a few bugs in your code. They have been pointed out here: https://stackoverflow.com/a/41224147/3027854

One problem with moments.csv

ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC 
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu, 
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A, 
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu, 
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,

except the header line you have one extra column in each line. As they end with ", ". I have added handling for that in my code

import csv

f1 = open('LCC.csv', 'rU') 
f2 = open('movements.csv', 'rU')
f3 = open('output.csv', 'w') 

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

# first we will read all LCC values into a set.
LCC_row_values = set()
for LCC_row in c1:
    LCC_row_values.add(LCC_row[0].strip())

row = 0
for movements_row in c2:
    row += 1
    if row == 1:
        # movements_row.append('is_present')
        # c3.writerow(movements_row)
        # skip header of moments.csv file
        continue
    # Remove last extra column from output row
    output_row = movements_row[:-1]
    if movements_row[7] in LCC_row_values:
        output_row.append('Yes')
    else:
        output_row.append('No')
    c3.writerow(output_row)

f1.close()
f2.close()
f3.close()

Here example files are

LCC.csv

Air Ab 
Jamb 
Sw 
AIRF 
EURO

movements.csv

ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC 
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu, 
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A, 
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu, 
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,

output.csv

Zue,LSZH,2005,200501,25,1/1/2005,Dep,EURO,EUJ,Mans C,EG,Gb,Eu,Yes
Zue,LSZH,2005,200501,204,1/1/2005,Arr,Sw,SWR,Dar,HA,Tans,A,Yes
Ba,LSZM,2005,200501,191,1/1/2005,Arr,AIRF,AFR,PG,LG,Fr,Eu,Yes
Zue,LSZH,2005,200501,228,1/1/2005,Dep,THA,THA,Bang,VD,Th,As,No

11 Comments

Hi, thanks a lot! I adapted your code so that column 8 is compared to column 1. However, I still receive an error message: Traceback (most recent call last): File "<stdin>", line 1, in <module> _csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? >>> What should I do? Also, I do not want to add a new column but fill the last one (column 14) which is currently empty. Thanks!
Please share top 5 lines of both CSV files. Put it in the original question. It will help me solve your problem.
please check now. Do you need header in output file?
I still receive quite the same error message: Traceback (most recent call last): File "<stdin>", line 7, in <module> _csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? I don't need a header in the output file.
@AnnaStünzi line 7 has some issue in the csv file. Can you add that line in the sample you have shared ..
|
0

You're trying to do too much at the same time. Split this into different tasks. First we'll read the contents of LCC.csv into a set (we could use a list, but sets are better for determining membership). Then we will go through movements.csv to rewrite it.

import csv

with open('LCC.csv', 'rb') as lcc:
    lcc_set = set()
    lcc_r = csv.reader(lcc)
    for l in lcc_r:
        for i in l:
            lcc_set.add(i)

with open('movements.csv', 'rb') as movements:
    mov_r = csv.reader(movements)
    with open('output.csv', 'wb') as output:
        out_w = csv.writer(output)
        for l in mov_r:
            #l.pop()
            if l[7] in lcc_set:
                l.append('Yes')
            else:
                l.append('No')
            out_w.writerow(l)

I'm not clear if you wanted to add a column or replace the last one. I've commented out the line that will cause the last column to be replaced by Yes or No

4 Comments

Hi Patrick, thanks for your help too. Using your code I get following error message: Traceback (most recent call last): File "<stdin>", line 11, in <module> AttributeError: '_csv.writer' object has no attribute 'writerowl' >>> do you see a reason why? Thank you so much!
@AnnaStünzi It looks like you're missing the parenthesis writerowl -> writerow(l)
sorry, yes - the code is not giving any error anymore, but there is no classification in the output.csv, all rows in column 14 have now entry 'No'..
It looks like that's because it's Sw in one file and "Sw" in the other. do if l[7].strip('"') in lcc_set instead

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.