2

I have two dataframes, A and B. A and B have the same indices and the same column names. However, their entries are different (a jumble of values and NaN).

I want to merge both A and B into another dataframe C with the same indices and columns.

Let's take A.iloc[1,2], the first row and third column entry of A for example. If that entry in A is NaN, but in B it is 99, I'd like C.iloc[1,2] to be 99. If they're both NaN, then the result will be NaN.

If they're both values, say 23 and 99, i'd like the merge to result in the larger number (99), but I need to flag the index as erroneous.

What I've done:

  1. Wrote a for loop using the rows and columns, to match between both dataframes. If an entry is more than 0 in A and more than 0 in B, for example, then I store the index of the entry in a list and append the larger value in C. This is horrible inefficient and I'd like to use a better method. (plus it failed because I'm a horrible programmer)

  2. Tried using pandas.merge. I don't particularly understand the merging process, but I've tried a few ways like pd.merge(A, B, left_on = A.index, right_on = B.index, how = 'outer', indicator = True) for example. It returned me a dataframe with even more rows and double the columns with x and y appended to the end of their names.

Any ideas?

1 Answer 1

1

So, from what I understand, you want to update df1 from df2 only for Non-Null values.

Take below Dataframes for example:

In [1761]: df1
Out[1761]: 
   val1  val2  val3
0   NaN   NaN  0.20
1   NaN   0.2   NaN
2   NaN   NaN  0.13
3   NaN  50.0  0.40

In [1762]: df2
Out[1762]: 
   val1   val2  val3
0    99   0.10   NaN
1    99    NaN  0.10
2    99    NaN  0.13
3    99  50.00  0.40

So, in above case, below updates will happen:

1.) All rows for column val1 of df1 will be updated by val1 of df2 as df2 has all Non-Null values for this column.

2.) Only 1st row for column val2 of df1 will be updated by val2 of df2 as df2 has Non-Null value for the 1st row of this column.

3.) Only 2nd row for column val3 of df1 will be updated by val3 of df2 as df2 has Non-Null value for the 2nd row of this column.

Note: 3rd row for col val2 of df1 will not be updated as it has a NULL value even in df2.

Below is the code to do the above:

df1[~df1.notnull()] = df2[df2.notnull()]

Now, df1 after updates looks like below:

In [1766]: df1
Out[1766]: 
   val1  val2  val3
0  99.0   0.1  0.20
1  99.0   0.2  0.10
2  99.0   NaN  0.13
3  99.0  50.0  0.40

I think this solves your question.

Sign up to request clarification or add additional context in comments.

8 Comments

It returned me a Nonetype object. For more background, A and B are both dataframes of 2228 rows and 40 columns. Row indices are all the same, and columns are named the same as well (i checked). Any idea why this might be happening?
Oh.. C = A.update(B) won't return anything. After the update command, A will get updated. You can assign A to a new frame C if you want. So, correct command is A.update(B).
Thanks, i get it now. Do you know if there is any way to show which entries have overlapping values?
By overlapping, do you mean values common in both dataframes?
Rather, which entries are non-empty in both dataframes pre-updating. I wrote a loop to check at first but it took too much time to run.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.