python merging with pandas (same index, same column) still returning me with _x and _y columns

Question

I have two dataframes, A and B. A and B have the same indices and the same column names. However, their entries are different (a jumble of values and NaN).

I want to merge both A and B into another dataframe C with the same indices and columns.

Let's take A.iloc[1,2], the first row and third column entry of A for example. If that entry in A is NaN, but in B it is 99, I'd like C.iloc[1,2] to be 99. If they're both NaN, then the result will be NaN.

If they're both values, say 23 and 99, i'd like the merge to result in the larger number (99), but I need to flag the index as erroneous.

What I've done:

Wrote a for loop using the rows and columns, to match between both dataframes. If an entry is more than 0 in A and more than 0 in B, for example, then I store the index of the entry in a list and append the larger value in C. This is horrible inefficient and I'd like to use a better method. (plus it failed because I'm a horrible programmer)
Tried using pandas.merge. I don't particularly understand the merging process, but I've tried a few ways like pd.merge(A, B, left_on = A.index, right_on = B.index, how = 'outer', indicator = True) for example. It returned me a dataframe with even more rows and double the columns with x and y appended to the end of their names.

Any ideas?

Mayank Porwal · Accepted Answer · 2018-10-29 14:31:48Z

1

So, from what I understand, you want to update df1 from df2 only for Non-Null values.

Take below Dataframes for example:

In [1761]: df1
Out[1761]: 
   val1  val2  val3
0   NaN   NaN  0.20
1   NaN   0.2   NaN
2   NaN   NaN  0.13
3   NaN  50.0  0.40

In [1762]: df2
Out[1762]: 
   val1   val2  val3
0    99   0.10   NaN
1    99    NaN  0.10
2    99    NaN  0.13
3    99  50.00  0.40

So, in above case, below updates will happen:

1.) All rows for column val1 of df1 will be updated by val1 of df2 as df2 has all Non-Null values for this column.

2.) Only 1st row for column val2 of df1 will be updated by val2 of df2 as df2 has Non-Null value for the 1st row of this column.

3.) Only 2nd row for column val3 of df1 will be updated by val3 of df2 as df2 has Non-Null value for the 2nd row of this column.

Note: 3rd row for col val2 of df1 will not be updated as it has a NULL value even in df2.

Below is the code to do the above:

df1[~df1.notnull()] = df2[df2.notnull()]

Now, df1 after updates looks like below:

In [1766]: df1
Out[1766]: 
   val1  val2  val3
0  99.0   0.1  0.20
1  99.0   0.2  0.10
2  99.0   NaN  0.13
3  99.0  50.0  0.40

I think this solves your question.

edited Oct 29, 2018 at 14:31

answered Oct 29, 2018 at 11:34

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Qonl Over a year ago

It returned me a Nonetype object. For more background, A and B are both dataframes of 2228 rows and 40 columns. Row indices are all the same, and columns are named the same as well (i checked). Any idea why this might be happening?

Mayank Porwal Over a year ago

Oh.. C = A.update(B) won't return anything. After the update command, A will get updated. You can assign A to a new frame C if you want. So, correct command is A.update(B).

Qonl Over a year ago

Thanks, i get it now. Do you know if there is any way to show which entries have overlapping values?

Mayank Porwal Over a year ago

By overlapping, do you mean values common in both dataframes?

Qonl Over a year ago

Rather, which entries are non-empty in both dataframes pre-updating. I wrote a loop to check at first but it took too much time to run.

|

Collectives™ on Stack Overflow

python merging with pandas (same index, same column) still returning me with _x and _y columns

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related