Replace rows in a Pandas df with rows from another df of different size

Question

I have 2 Pandas dfs, A and B , with some matching columns but different number of rows. I want to copy values of matching columns from B to A based on some conditions. I have tried this:

s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])

df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])

df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print (df1)
   A   B  C
0  5   6  p
1  6   7  q
2  7   8  r
3  8   9  s
4  9  10  t

This works when the number of rows are same, but if B has fewer rows, the index is not aligned and i get NaN in the final df. For example, df2 has one fewer rows and the row indexes are not aligned

df2 = pd.DataFrame([list(s1), list(s2),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]

print (df1)
       A     B    C
    0  5   6.0    p
    1  6   7.0    q
    2  7   8.0    r
    3  8  10.0    t
    4  9   NaN  NaN

How to do this and copy the value if the values in columns A are same ?

ilia timofeev · Accepted Answer · 2018-08-25 10:56:56Z

3

It seems like you are looking for pd.update Example from help,

df = pd.DataFrame({'A': [1, 2, 3],
                'B': [400, 500, 600]})
new_df = pd.DataFrame({'B': [4, np.nan, 6]})
df.update(new_df)

df

result

   A      B
0  1    4.0
1  2  500.0
2  3    6.0

Example from help, but note that index is important. You are using default index which is initial value order, but if you are updating values from dataframe with different length it may be important which rows you want to update.

answered Aug 25, 2018 at 10:56

ilia timofeev

1,1197 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Praveen · Accepted Answer · 2018-09-07 03:25:18Z

0

The only way i found to solve this is as below:

df = df1.merge(df2, how='left', left_on='A', right_on = 'A')

def select(row, field):
    if row['{}_y'.format(field)] is np.nan:
        return row['{}_x'.format(field)]
    else:
        return row['{}_y'.format(field)]

df['C'] = df[['C_x', 'C_y']].apply(select, field='C', axis=1)
df['B'] = df[['B_x', 'B_y']].apply(select, field='B', axis=1)

Seems roundabout but cannot think of a better solution.

answered Sep 7, 2018 at 3:25

Praveen

2,1871 gold badge20 silver badges22 bronze badges

Collectives™ on Stack Overflow

Replace rows in a Pandas df with rows from another df of different size

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related