1

I want to add new column based on row condition which is based on two different columns of same dataframe.

I have below Dataframe -

df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:'',6:''},
        'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506',6:''}}
df=pd.DataFrame(df1_data)
print df

I want to add new column named as "sym".

Condition -

  1. If 'e_id' column value is not null then sym column value is 'e_id' column value.
  2. If 'r_id' column value is not null then sym column value is 'r_id' column value.
  3. If 'e_id' and 'r_id' both column values are null then remove this particular row from pandas dataframe.

I tried with below code -

df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:''},
        'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506'}}

df=pd.DataFrame(df1_data)
print df

if df['e_id'].any():
    df['sym'] = df['e_id']
print df

if df['r_id'].any():
    df['sym'] = df['r_id']
print df

But it is giving me a wrong output.

Expected output -

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

3 Answers 3

2

pandas
Using mask + fillna + assign

d1 = df.mask(df == '')
df.assign(sym=d1.e_id.fillna(d1.r_id)).dropna(subset=['sym'])

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

How It Works

  • I need to mask your '' values with the assumption that you meant those to be null
  • By using fillna I take e_id if it's not null otherwise take r_id if it's not null
  • dropna with subset=['sym'] only drops the row if the new column is null and that is only null if both e_id and r_id were null

numpy
Using np.where + assign

e = df.e_id.values
r = df.r_id.values
df.assign(
    sym=np.where(
        e != '', e,
        np.where(r != '', r, np.nan)
    )
).dropna(subset=['sym'])

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

numpy v2
Reconstruct the dataframe from values

v = df.values
m = (v != '').any(1)
v = v[m]
c1 = v[:, 0]
c2 = v[:, 1]
pd.DataFrame(
    np.column_stack([v, np.where(c1 != '', c1, c2)]),
    df.index[m], df.columns.tolist() + ['sym']
)

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Timing

%%timeit
e = df.e_id.values
r = df.r_id.values
df.assign(sym=np.where(e != '', e, np.where(r != '', r, np.nan))).dropna(subset=['sym'])
1000 loops, best of 3: 1.23 ms per loop

%%timeit
d1 = df.mask(df == '')
df.assign(sym=d1.e_id.fillna(d1.r_id)).dropna(subset=['sym'])
100 loops, best of 3: 2.44 ms per loop

%%timeit
v = df.values
m = (v != '').any(1)
v = v[m]
c1 = v[:, 0]
c2 = v[:, 1]
pd.DataFrame(
    np.column_stack([v, np.where(c1 != '', c1, c2)]),
    df.index[m], df.columns.tolist() + ['sym']
)
1000 loops, best of 3: 204 µs per loop
Sign up to request clarification or add additional context in comments.

Comments

2

First filter both empty columns by boolean indexing with any:

df = df[(df != '').any(1)]
#alternatively
#df = df[(df['e_id'] != '') | (df['r_id'] != '')]

Then use mask with combine_first:

df['sym'] = df['e_id'].mask(df['e_id'] == '').combine_first(df['r_id'])
print (df)

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Numpy solution with filtering and numpy.where:

df = df[(df['e_id'] != '') | (df['r_id'] != '')]
e_id = df.e_id.values
r_id = df.r_id.values
df['sym'] = np.where(e_id != '', e_id, r_id)
print (df)
  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

2 Comments

@jezrael- I'm facing TypeError: Could not compare [''] with block values error in df = df[(df != '').any(1)] line but your alternate solution works fine. ;-)
Ok, no problem. Btw, alternative solution is faster ;)
0

You can start with column 'e_id' and replace its values with 'r_id' values whenever 'e_id' is "empty", using pandas.DataFrame.mask and the 'other' parameter:

df['sym'] = df['e_id'].mask(df['e_id'] == '', other=df['r_id'], axis=0)

then you just need to remove rows where sym is "empty"

df = df[df.sym!='']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.