4

We have a dataframe:

data = [['A1', 'B1'], ['A2', 'B2', 1, 2], ['A3', 'B3', 3, 4], ['A4', 'B4']]
df = pd.DataFrame(data, columns=['A','B','C','D'])

which looks like this

A  | B  | C   | D
-------------------
A1 | B1 | NaN | NaN
A2 | B2 | 1   | 2
A3 | B3 | 3   | 4
A4 | B4 | Nan | NaN
-------------------

for columns C and D, they will both be populated or both be NaN (there wont be a scenario where C is NaN and D will have values or vice versa)

My goal is to transform the dataframe to look like this:

A  | B  | C   | D
-------------------
A1 | B1 | NaN | NaN
1  | 2  | 1   | 2
3  | 4  | 3   | 4
A4 | B4 | Nan | NaN
-------------------

I have tried

df.loc[df['C'].notna(), ['A', 'B']] = df.loc[df['C'].notna(), ['C', 'D']]
# the above just assigns back NaN values instead of 1,2,3,4 

and

m = df['C'].notna()
df[['A', 'B']] = np.where(m, df[['C', 'D']], df[['A', 'B']])
# the above errors with operands could not be broadcast together with shapes (4,) (4,2) (4,2) 
df[['X', 'Y']] = pd.DataFrame(np.where(m, df[['C', 'D']]), df[['A', 'B']])
# the above errors with ValueError: either both or neither of X and Y should be given

I have looked into this question here, and tried some ways to convert df[['C', 'D']] as lists and assign it back as a new dataframe but I have still not been able to make it work.

I know I can assign the columns (A-C, B-D) individually but I am dealing with a large number of such pairs and want to avoid looping through them. Is there a clean way to do this operation in a single pass?

using pandas version 0.25.3.

Thanks for the help!

2 Answers 2

6

Using pandas.loc[...]:

df.loc[~df['C'].isna(), 'A']=df.loc[~df['C'].isna(), 'C']
df.loc[~df['D'].isna(), 'B']=df.loc[~df['D'].isna(), 'D']

Using np.where(...):

import numpy as np

df[['A', 'B']]=np.where(df['C'].notna().to_numpy().reshape(-1,1), df[['C', 'D']], df[['A', 'B']])

Output:

    A   B    C    D
0  A1  B1  NaN  NaN
1   1   2  1.0  2.0
2   3   4  3.0  4.0
3  A4  B4  NaN  NaN
Sign up to request clarification or add additional context in comments.

1 Comment

Could you please explain the .to_numpy().reshape(-1.1) part for me.
3

Try retrieving values to assign, namely:

import pandas as pd

data = [['A1', 'B1'], ['A2', 'B2', 1, 2], ['A3', 'B3', 3, 4], ['A4', 'B4']]
df = pd.DataFrame(data, columns=['A','B','C','D'])

df.loc[df['C'].notna(), ['A','B']] = df.loc[df['C'].notna(), ['C','D']].to_numpy()

df

    A   B    C    D
0  A1  B1  NaN  NaN
1   1   2  1.0  2.0
2   3   4  3.0  4.0
3  A4  B4  NaN  NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.