1

Hi I have an excel data with multiple columns and i need to fined specific word and return it in new column the table look like this:

ID   col0  col1  col2  col3  col4  col5
1    jack  a/h   t/m   w/n   y/h    56
2    sam   z/n   b/w   null  null   93
3    john  b/i   y/d   p/d   null   33

I want to look for 'b' in columns col1, col2, col3, and col4 and create a new column called "b" where the value the cell value with be is returned

the result would look like this

ID   col0  col1  col2  col3  col4  col5  b
1    jack  a/h   t/m   w/n   y/h    56   -
2    sam   z/n   b/w   null  null   93   b/w
3    john  b/i   y/d   p/d   null   33   b/i

and I need an efficient way to do it I tried to use where like this

df1 = df[['col1', 'col2', 'col3', 'col4']]

df1['b']==[x for x in df1.values[0] if any(b for b in lst if b in str(x))]

I got this from this answer https://stackoverflow.com/a/50250103/3105140

yet it is not working for me snice I have null value and rows where the condition do not work

3 Answers 3

3

Here is a way using stack and str.contains with df.where:

cols = ['col1', 'col2', 'col3', 'col4']
df['b'] = (df[cols].where(df[cols].stack().str.contains('b')
         .unstack(fill_value=False)).ffill(1).iloc[:,-1])

print(df)

   ID  col0 col1 col2 col3 col4  col5    b
0   1  jack  a/h  t/m  w/n  y/h    56  NaN
1   2   sam  z/n  b/w  NaN  NaN    93  b/w
2   3  john  b/i  y/d  p/d  NaN    33  b/i
Sign up to request clarification or add additional context in comments.

Comments

3

I would use DataFrame.stack with callable:

cols = ['col1', 'col2', 'col3', 'col4']
df['b']=(df[cols].stack()
                 .loc[lambda x: x.str.contains('b')]
                 .reset_index(level=1,drop=1)
                #.fillna('-') #for the expected output
        )

Output

   ID  col0 col1 col2 col3 col4  col5    b
0   1  jack  a/h  t/m  w/n  y/h    56  NaN
1   2   sam  z/n  b/w  NaN  NaN    93  b/w
2   3  john  b/i  y/d  p/d  NaN    33  b/i

Comments

0

In a bid to avoid selecting the columns, I used melt:

M = (df.copy()
     .melt(id_vars='ID')
     .loc[lambda x:x['value'].astype('str').str.contains('b')]
     .drop('variable',axis=1))

pd.merge(df,M,how='left',on='ID').rename({'value':'b'},axis=1)

    D   col0    col1    col2    col3    col4    col5     b
0   1   jack    a/h     t/m     w/n     y/h      56     NaN
1   2   sam     z/n     b/w     NaN     NaN      93     b/w
2   3   john    b/i     y/d     p/d     NaN      33     b/i

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.