Get matching rows in two different dataframe python

Question

Thanks in advance- the problem is to compare rows of two separate dataframes of csv files; with and without column headings. I want to match rows in second dataframe to rows in dataframe one. I cannot use merge because both don't have common column names to merge with.

1: The first dataframe have headings

2: Second dataframe is without headings.

3: get the position of the match

I have tried this:

    df1 = pd.read_csv(data1)
    df2 = pd.read_csv(data2)
    def test1():
    for index, rows in df1.iterrows():
        if rows in (df2):
            return nrows

Datasets:

first dataset:

Second dataset:

Don’t use iterrows(), itertuples() is far better. Can you share the actual content of the DataFrames or files? See: minimal reproducible example. — AMC
– AMC, Commented Nov 17, 2019 at 7:08
NH23345 mountain2B 936 56.870342 -4.199001 NH136714 A' Chailleach 997 57.6938 -5.128715 NH681041 A' Chailleach 929.2 57.109564 -4.179285 NH094147 A' Chraileag (A' Chralaig) 1120 57.184186 -5.154837 NH008231 A' Ghlas-bheinn 918 57.25509 -5.303687 NH007749 A' Mhaighdean 967 57.719644 -5.34672 NN604762 AA 973.2 56.857002 -4.290668 — happycoder
– happycoder, Commented Nov 17, 2019 at 7:36
Hill Name Height Latitude Longitude Osgrid A' Bhuidheanach Bheag 936 56.870342 -4.199001 NN660775 A' Chailleach 997 57.6938 -5.128715 NH136714 A' Chailleach 929.2 57.109564 -4.179285 NH681041 A' Chraileag (A' Chralaig) 1120 57.184186 -5.154837 NH094147 A' Ghlas-bheinn 918 57.25509 -5.303687 NH008231 A' Mhaighdean 967 57.719644 -5.34672 NH007749 A' Mharconaich 973.2 56.857002 -4.290668 NN604762 Am Basteir 934 57.247931 -6.202982 NG465253 Am Bodach 1031.8 56.741727 -4.983393 NN176650 Am Faochagach 953 57.771801 -4.853899 NH303793 — happycoder
– happycoder, Commented Nov 17, 2019 at 7:38
The first is the first dataset without column names, while the second is the dataset with column names. These are stored in separate csv files — happycoder
– happycoder, Commented Nov 17, 2019 at 7:39

Aryerez · Accepted Answer · 2019-11-17 08:23:09Z

1

First add header to the second dataframe with:

df2.columns = df1.columns

Or, much better, define them in the first place when reading the file with:

df2 = pd.read_csv(data2, header=None, names=df1.columns.tolist())

And then inner merge them to stay with just the rows that exists identically in both:

united_df = df1.merge(df2, how='inner')

answered Nov 17, 2019 at 8:23

Aryerez

3,5032 gold badges12 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

happycoder Over a year ago

Thanks for your contribution @Aryerez. Assigning df1 columns to df2 in case of not not identical dataset and merging will mean not having the right column headings

Aryerez Over a year ago

@happycoder If the datasets are not identical, the entire question is meaningless. What exactally do you want to match with what, if they don't mean the same?

happycoder Over a year ago

Sorry @Aryere, aim is to read df1 row by row; if any row match to rows in df2 flag it.

Aryerez Over a year ago

Ok, but if the columns in df2 do not neccesary mean the same as the columns in df1, then it's not a match. Let's say for example that the 3rd column in df2 means "Longtitude", while the 4th column in df2 means "Latitude" (the opposite than their meaning in df1). How could you determine that they do match, if the Longtitude in one row in df1 matches the Latitude (and not the Longtitude) in df2?

Collectives™ on Stack Overflow

Get matching rows in two different dataframe python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related