2

I have a loop that each time creates a dataframe(DF) with a form

DF

  ID        LCAR        RCAR  ...     LPCA1     LPCA2     RPCA2
0 d0129  312.255859  397.216797  ...  1.098888  1.101905  1.152332

and then add that dataframe to an existing dataframe(main_exl_df) with this form:

main_exl_df

         ID  Date     ... COGOTH3  COGOTH3X COGOTH3F
0     d0129   NaN    ...     NaN       NaN      NaN
1     d0757   NaN    ...     0.0       NaN      NaN
2     d2430   NaN    ...     NaN       NaN      NaN
3     d3132   NaN    ...     0.0       NaN      NaN
4     d0371   NaN    ...     0.0       NaN      NaN
                 ...   ...       ...  ...     ...       ...      ...
2163  d0620   NaN    ...     0.0       NaN      NaN
2164  d2410   NaN    ...     0.0       NaN      NaN
2165  d0752   NaN    ...     NaN       NaN      NaN
2166  d0407   NaN    ...     0.0       NaN      NaN

at each iteration main_exl_df is saved and then loaded again for the next iteration.

I tried

main_exl_df = pd.concat([main_exl_df, DF], axis=1)

but this add the columns each time to the right side of the main_exl_df and does not recognize the index if 'ID' row.

how I can specify to add the new dataframe(DF) at the row with correct ID and right columns?

1
  • I have also tried main_exl_df = pd.merge(main_exl_df, DF, on=main_exl_df.columns[0]) to recognize the correct ID, but when I save the main_exl_df , only one row is saved and the the rest of columns and rows are lost. Commented Aug 20, 2020 at 15:25

2 Answers 2

2

Merge is the way to go for combining columns in such cases. When you use pd.merge, you need to specify whether the merge is inner, left or right. Assuming that in this case, you want to keep all the rows in main_exl_df, you should merge using:

main_exl_df = main_exl_df.merge(DF, how='left', on='ID')

If you want to keep rows from both the dataframes, use outer as argument value:

main_exl_df = main_exl_df.merge(DF, how='outer', on='ID')
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. This fixed the problem with merge not saving all the rows and also can recognize to merge the two dataframes on the right 'ID' row. However both options of how='left' or how='outer' had the same output in a way that with each merge new columns were created (from main_exl_df with _x suffix and from DF with _y suffix). To fix it I tried this: main_exl_df = main_exl_df.merge(DF, how='outer', on=columns_label) which columns_label is the list of all the mutual column labels from both dataframes. But this didn't fix the problem either.
@ReiRei This means that you have other common columns in the dataframes too. To fix this, you can merge on all the common columns and not just on 'ID' column. Also, check out (stackoverflow.com/questions/19125091/…) to remove duplicate columns while merging.
thank you very for your answer. I used the linked you sent to solve the problem. I upvoted your answer however unfortunately it wouldn't show publicly because my reputation is less than 15 right now.
1

This is what solved the problem at the end (with the help of this answer):

I used the merge function however merge created duplicate columns with _x and _y suffixes. To get rid of the _x suffixes I used this function:

    def drop_x(df):
        # list comprehension of the cols that end with '_x'
        to_drop = [x for x in df if x.endswith('_x')]
        df.drop(to_drop, axis=1, inplace=True)

and then merged the two dataframes while replacing the _y suffixes with empty string:

    col_to_use = DF.columns.drop_duplicates(main_exl_df)
    main_exl_df = main_exl_df.merge(DF[col_to_use], on='ID', how='outer', suffixes=('_x', ''))
    drop_x(main_exl_df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.