5

Suppose I have two data frame 'df_a' & 'df_b' , both have the same index structure and columns, but some of the inside data elements are different:

>>> df_a
           sales cogs
STK_ID QT           
000876 1   100  100
       2   100  100
       3   100  100
       4   100  100
       5   100  100
       6   100  100
       7   100  100

>>> df_b
           sales cogs
STK_ID QT           
000876 5    50   50
       6    50   50
       7    50   50
       8    50   50
       9    50   50
       10   50   50

And now I want to replace the element of df_a by element of df_b which have the same (index, column) coordinate, and attach df_b's elements whose (index, column) coordinate beyond the scope of df_a . Just like add a patch 'df_b' to 'df_a' :

>>> df_c = patch(df_a,df_b)
           sales cogs
STK_ID QT           
000876 1   100  100
       2   100  100
       3   100  100
       4   100  100
       5    50   50
       6    50   50
       7    50   50
       8    50   50
       9    50   50
       10   50   50

How to write the 'patch(df_a,df_b)' function ?

2
  • This looks like a use case for the not yet implemented df_a.update(df_b, join='outer'), see help(df_a.update) Commented Sep 3, 2012 at 12:10
  • Isn't that just combine_first (per below)? Commented Sep 9, 2012 at 2:43

4 Answers 4

2

Try this:

df_c = df_a.reindex(df_a.index | df_b.index)
df_c.ix[df_b.index] = df_b
Sign up to request clarification or add additional context in comments.

Comments

2

To fill gaps in one dataframe with values (or even full rows) from another, take a look at the df.combine_first() built-in method.

In [34]: df_b.combine_first(df_a)
Out[34]: 
           sales  cogs
STK_ID QT             
000876 1     100   100
       2     100   100
       3     100   100
       4     100   100
       5      50    50
       6      50    50
       7      50    50
       8      50    50
       9      50    50
       10     50    50

Comments

1

Similar to BrenBarn's answer, but with more flexibility:

# reindex both to union of indices
df_ar = df_a.reindex(df_a.index | df_b.index)
df_br = df_b.reindex(df_a.index | df_b.index)

# replacement criteria can be put in this lambda function
combiner = lambda: x, y: np.where(y < x, y, x)
df_c = df_ar.combine(df.br, combiner)

2 Comments

I think on the 2nd line df_a.index already includes df_b.index
@Winand Correct. I fixed it.
0

I was struggling with the same issue, the code in the previous answers didn't work in my dataframes. They have 2 index columns and the reindex operation results in NaN values in strange places (I'll post the dataframe contents if anyone is willing do debug it).

I found an alternate solution. I'm reviving this thread hoping this may be useful to others:

# concatenate df_a and df_b
df_c = concat([dfbd,dfplanilhas])

# clears the indexes (turns the index columns into regular dataframe columns)
df_c.reset_index(inplace='True')

# removes duplicates keeping the last occurence (hence updating df_a with values from df_b)
df_c.drop_duplicates(subset=['df_a','df_b'], take_last='True', inplace='True')

Not a very elegant solution, but seems to work.

I hope df.update gets a join='outer' option soon...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.