0

I am looking to compare two CSVs. Both CSVs will have nearly identical data, however the second CSV will have 2 identical rows that CSV 1 does not have. I would like the program to output both of those 2 rows so I can see which row is present in CSV 2, but not CSV 1, and how many times that row is present.

Here is my current logic:

import csv
import pandas as pd
import numpy as np

data1 = {"Col1": [0,1,1,2],
         "Col2": [1,2,2,3],
         "Col3": [5,2,1,1],
         "Col4": [1,2,2,3]}

data2 = {"Col1": [0,1,1,2,4,4],
         "Col2": [1,2,2,3,4,4],
         "Col3": [5,2,1,1,4,4],
         "Col4": [1,2,2,3,4,4]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

ds1 = set(tuple(line) for line in df1.values)
ds2 = set(tuple(line) for line in df2.values)
df = pd.DataFrame(list(ds2.difference(ds1)), columns=df2.columns)

print(df)

Here is my current outcome:

   Col1  Col2  Col3  Col4  
0     4     4     4     4

Here is my desired outcome:

   Col1  Col2  Col3  Col4  
0     4     4     4     4
1     4     4     4     4

As of right now, it only outputs the row once even though CSV has the row twice. What can I do so that it not only shows the missing row, but also for each time it is in the second CSV? Thanks in advance!

2 Answers 2

1

There is almost always a built-in pandas function meant to do what you want that will be better than trying to re-invent the wheel.

df = df2[~df2.isin(df1).all(axis=1)]
# OR df = df2[df2.ne(df1).all(axis=1)]
print(df)

Output:

   Col1  Col2  Col3  Col4
4     4     4     4     4
5     4     4     4     4
Sign up to request clarification or add additional context in comments.

Comments

1

You can use:

df2[~df2.eq(df1).all(axis=1)]

Result:

   Col1  Col2  Col3  Col4
4     4     4     4     4
5     4     4     4     4

Or (if you want the index to be 0 and 1):

df2[~df2.eq(df1).all(axis=1)].reset_index(drop=True)

Result:

   Col1  Col2  Col3  Col4
0     4     4     4     4
1     4     4     4     4

N.B.

You can also use df2[df2.ne(df1).all(axis=1)] instead of df2[~df2.eq(df1).all(axis=1)].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.