0

Thanks in advance- the problem is to compare rows of two separate dataframes of csv files; with and without column headings. I want to match rows in second dataframe to rows in dataframe one. I cannot use merge because both don't have common column names to merge with.

1: The first dataframe have headings

2: Second dataframe is without headings.

3: get the position of the match

I have tried this:

    df1 = pd.read_csv(data1)
    df2 = pd.read_csv(data2)
    def test1():
    for index, rows in df1.iterrows():
        if rows in (df2):
            return nrows 

Datasets:

first dataset:
first dataset

Second dataset:
Second dataset

6
  • Don’t use iterrows(), itertuples() is far better. Can you share the actual content of the DataFrames or files? See: minimal reproducible example. Commented Nov 17, 2019 at 7:08
  • NH23345 mountain2B 936 56.870342 -4.199001 NH136714 A' Chailleach 997 57.6938 -5.128715 NH681041 A' Chailleach 929.2 57.109564 -4.179285 NH094147 A' Chraileag (A' Chralaig) 1120 57.184186 -5.154837 NH008231 A' Ghlas-bheinn 918 57.25509 -5.303687 NH007749 A' Mhaighdean 967 57.719644 -5.34672 NN604762 AA 973.2 56.857002 -4.290668 Commented Nov 17, 2019 at 7:36
  • Hill Name Height Latitude Longitude Osgrid A' Bhuidheanach Bheag 936 56.870342 -4.199001 NN660775 A' Chailleach 997 57.6938 -5.128715 NH136714 A' Chailleach 929.2 57.109564 -4.179285 NH681041 A' Chraileag (A' Chralaig) 1120 57.184186 -5.154837 NH094147 A' Ghlas-bheinn 918 57.25509 -5.303687 NH008231 A' Mhaighdean 967 57.719644 -5.34672 NH007749 A' Mharconaich 973.2 56.857002 -4.290668 NN604762 Am Basteir 934 57.247931 -6.202982 NG465253 Am Bodach 1031.8 56.741727 -4.983393 NN176650 Am Faochagach 953 57.771801 -4.853899 NH303793 Commented Nov 17, 2019 at 7:38
  • The first is the first dataset without column names, while the second is the dataset with column names. These are stored in separate csv files Commented Nov 17, 2019 at 7:39
  • Probably better to include that in your post, eh. Commented Nov 17, 2019 at 7:39

1 Answer 1

1

First add header to the second dataframe with:

df2.columns = df1.columns

Or, much better, define them in the first place when reading the file with:

df2 = pd.read_csv(data2, header=None, names=df1.columns.tolist())

And then inner merge them to stay with just the rows that exists identically in both:

united_df = df1.merge(df2, how='inner')
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your contribution @Aryerez. Assigning df1 columns to df2 in case of not not identical dataset and merging will mean not having the right column headings
@happycoder If the datasets are not identical, the entire question is meaningless. What exactally do you want to match with what, if they don't mean the same?
Sorry @Aryere, aim is to read df1 row by row; if any row match to rows in df2 flag it.
Ok, but if the columns in df2 do not neccesary mean the same as the columns in df1, then it's not a match. Let's say for example that the 3rd column in df2 means "Longtitude", while the 4th column in df2 means "Latitude" (the opposite than their meaning in df1). How could you determine that they do match, if the Longtitude in one row in df1 matches the Latitude (and not the Longtitude) in df2?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.