I have 2 dataframes. One is small with lesser columns of the other one. I want to update df1 with values from the available columns in df2. How do I do it?
Eg:
df1:
| Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 8 | 20 | 34 | 123 | 623 | 55 | 46 | 2 | 1 | 22 | 44 |
| 623 | 8 | 76 | 34 | 322 | 0 | 4 | 0 | 7 | 9 | 77 | 23 |
| 1 | 8 | 11 | 34 | 123 | 2 | 3 | 46 | 2 | 1 | 22 | 44 |
df2:
| Jan | Jul | Mar | Oct | May | Dec |
|---|---|---|---|---|---|
| 1 | 55 | 20 | 34 | 123 | 623 |
| abc | d | e | Mon | Tue | Wed |
| 1 | 3 | 11 | 34 | 123 | 2 |
Expected Output:
| Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 8 | 20 | 34 | 123 | 623 | 55 | 46 | 2 | 1 | 22 | 44 |
| 623 | 8 | 76 | 34 | 322 | 0 | 4 | 0 | 7 | 9 | 77 | 23 |
| 1 | 8 | 11 | 34 | 123 | 2 | 3 | 46 | 2 | 1 | 22 | 44 |
| 1 | 20 | 123 | 55 | 34 | 623 | ||||||
| abc | e | Tue | d | Mon | Wed | ||||||
| 1 | 11 | 123 | 3 | 34 | 2 |
Tried the below code, but it didnt work for my need:
def update_dataframe(df1, df2, key_column):
"""
Updates df1 with values from df2 based on a common key column.
Only columns present in df2 will be updated in df1.
Parameters:
- df1 (pd.DataFrame): The larger DataFrame.
- df2 (pd.DataFrame): The smaller DataFrame with updated values.
- key_column (str): The column name used as the key for matching rows.
Returns:
- pd.DataFrame: Updated df1.
"""
df2_clean = df2.drop_duplicates(subset=key_column)
# Set the key column as index for both DataFrames
df1.set_index(key_column, inplace=True)
df2_clean.set_index(key_column, inplace=True)
# Update df1 with values from df2
df1.update(df2_clean)
# Reset index to return to original structure
df1.reset_index(inplace=True)
return df1
# Example usage:
# updated_df = update_dataframe(df1, df2, "ID")
pd.concat([df1, df2])