2

I have 2 Pandas dfs, A and B , with some matching columns but different number of rows. I want to copy values of matching columns from B to A based on some conditions. I have tried this:

s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])

df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])

df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print (df1)
   A   B  C
0  5   6  p
1  6   7  q
2  7   8  r
3  8   9  s
4  9  10  t

This works when the number of rows are same, but if B has fewer rows, the index is not aligned and i get NaN in the final df. For example, df2 has one fewer rows and the row indexes are not aligned

df2 = pd.DataFrame([list(s1), list(s2),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]

print (df1)
       A     B    C
    0  5   6.0    p
    1  6   7.0    q
    2  7   8.0    r
    3  8  10.0    t
    4  9   NaN  NaN

How to do this and copy the value if the values in columns A are same ?

2 Answers 2

3

It seems like you are looking for pd.update Example from help,

df = pd.DataFrame({'A': [1, 2, 3],
                'B': [400, 500, 600]})
new_df = pd.DataFrame({'B': [4, np.nan, 6]})
df.update(new_df)

df

result

   A      B
0  1    4.0
1  2  500.0
2  3    6.0

Example from help, but note that index is important. You are using default index which is initial value order, but if you are updating values from dataframe with different length it may be important which rows you want to update.

Sign up to request clarification or add additional context in comments.

Comments

0

The only way i found to solve this is as below:

df = df1.merge(df2, how='left', left_on='A', right_on = 'A')

def select(row, field):
    if row['{}_y'.format(field)] is np.nan:
        return row['{}_x'.format(field)]
    else:
        return row['{}_y'.format(field)]

df['C'] = df[['C_x', 'C_y']].apply(select, field='C', axis=1)
df['B'] = df[['B_x', 'B_y']].apply(select, field='B', axis=1)

Seems roundabout but cannot think of a better solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.