How to verify that two different .csv files column ids match with python?

Question

I have two different .csv files, but they have the same id colummn.

file_1.csv:
id, column1, column2
4543DFGD_werwe_23, string
4546476FGH34_wee_24, string
....
45sd234_w32rwe_2342342, string

The other one:

file_1.csv:
id, column3, column4
4543DFGD_werwe_23, bla bla bla
4546476FGH34_wee_24, bla bla bla
....
45sd234_w32rwe_2342342, bla bla bla

How can I verify that this two columns match(have the same id) or are the same with the csv module or with pandas?.

EdChum · Accepted Answer · 2015-02-27 19:45:22Z

3

After loading you can call equals on the id column:

df['id'].equals(df1['id'])

This will return True of False if they are exactly the same, in length and same values in the same order

In [3]:

df = pd.DataFrame({'id':np.arange(10)})
df1 = pd.DataFrame({'id':np.arange(10)})
df.id.equals(df1.id)
Out[3]:
True

In [7]:

df = pd.DataFrame({'id':np.arange(10)})
df1 = pd.DataFrame({'id':[0,1,1,3,4,5,6,7,8,9]})
df.id.equals(df1.id)
Out[7]:
False
In [8]:

df.id == df1.id
Out[8]:
0     True
1     True
2    False
3     True
4     True
5     True
6     True
7     True
8     True
9     True
Name: id, dtype: bool

To load the csvs:

df = pd.read_csv('file_1.csv')
df1 = pd.read_csv('file_2.csv') # I'm assuming your real other csv is not the same name as file_1.csv

Then you can perform the same comparison as above:

df.id.equals(df1.id)

If you just want to compare the id columns you can specify just to load that column:

df = pd.read_csv('file_1.csv', usecols=['id'])
df1 = pd.read_csv('file_2.csv', usecols=['id'])

edited Feb 27, 2015 at 19:45

answered Feb 27, 2015 at 19:27

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

tumbleweed Over a year ago

Wow.. nice. Thanks for the help. How can I change the 'id':np.arange(10) for the lenght of a large file?.

EdChum Over a year ago

You're a little confused, my code shows sample data, I will update to show how to load the csv in pandas and perform the same comparison

EdChum Over a year ago

I can tell you that the pandas csv module is lightning fast at loading csv files, faster than the python standard csv module, see the link: wesmckinney.com/blog/…

EdChum Over a year ago

@ml_guy No, ignore the np.arange portion, straight after loading the csv's just do df.id.equals(df1.id) no need to construct new dfs

EdChum Over a year ago

pd.set_option('display.max_rows=-1')

|

Vivek Sable · Accepted Answer · 2015-02-27 19:38:30Z

1

By csv module:

Open file both files.
Reader file by csv reader() method.
Create dictionary as first item from the row is key and value is row.
Use set intersection method to get same keys from the dictionaries.
Print result.

code:

import csv

file1 =  '/home/vivek/Desktop/stackoverflow/fil1.csv'
file2 =  '/home/vivek/Desktop/stackoverflow/fil2.csv'

with open(file1) as fp1:
    root = csv.reader(fp1)
    rows1 = {}
    for i in root:
        rows1[i[0]]=i
    if "id" in rows1:
        del rows1["id"]

with open(file2) as fp1:
    root = csv.reader(fp1)
    rows2 = {}
    for i in root:
        rows2[i[0]]=i
    if "id" in rows2:
        del rows2["id"]

result = set(rows1.keys()).intersection(set(rows2.keys()))

print "Same Id :", list(result)

output:

vivek@vivek:~/Desktop/stackoverflow$ python 27.py
Same Id : ['4546476FGH34_wee_24', '4543DFGD_werwe_23', '45sd234_w32rwe_2342342']

answered Feb 27, 2015 at 19:38

Vivek Sable

10.3k6 gold badges45 silver badges63 bronze badges

3 Comments

Vivek Sable Over a year ago

Welcome. I am also looking above pandas implementation

tumbleweed Over a year ago

Thanks for the help but I got this:Same Id : [] maybe I am doing something wrong, how can I fix it?.

Vivek Sable Over a year ago

pass me ur py file with input files on email- [email protected]

Collectives™ on Stack Overflow

How to verify that two different .csv files column ids match with python?

2 Answers 2

14 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

14 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related