0

I was trying to add a new column by giving multiple strings contain conditions using str.contains() and np.where() function. By this way, I can have the final result I want.

But, the code is very lengthy. Are there any good ways to reimplement this using pandas function?

df5['new_column'] = np.where(df5['sr_description'].str.contains('gross to net', case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross up', case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net to gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross-to-net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross-up',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net-to-gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross 2 net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net 2 gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('memo code',case=False).fillna(False),1,0)))))))))))

This output will be,

if those strings contain in 'sr_description' then give a 1, else 0 to new_column

Maybe store the multiple string conditions in a list then read and apply them to a function.

Edit:

Sample Data:

sr_description                  new_column
something with gross up.           1
without those words.               0
or with Net to gross               1
if not then we give a '0'          0
8
  • 1
    you can use a regex that match all your cases with 'or' operator Commented Nov 4, 2019 at 21:02
  • Also you can see numpy.select Commented Nov 4, 2019 at 21:05
  • Whenever there is 'gross' is 1? Commented Nov 4, 2019 at 21:07
  • It’s probably better to use booleans than “0” and “1”. I think I know how to solve this, I will try to post an answer tomorrow. Commented Nov 5, 2019 at 7:05
  • Could you explain what the code is meant to do? Commented Nov 5, 2019 at 21:56

1 Answer 1

2

Here is what I came up with.

Code:

import re
import pandas as pd
import numpy as np

# list of the strings we want to check for
check_strs = ['gross to net', 'gross up', 'net to gross', 'gross-to-net', 'gross-up', 'net-to-gross', 'gross 2 net',
             'net 2 gross', 'gross net', 'net gross', 'memo code']

# From the re.escape() docs: Escape special characters in pattern. 
# This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
check_strs_esc = [re.escape(curr_val) for curr_val in check_strs]

# join all the escaped strings as a single regex
check_strs_re = '|'.join(check_strs_esc)

test_col_1 = ['something with gross up.', 'without those words.', np.NaN, 'or with Net to gross', 'if not then we give a "0"']
df_1 = pd.DataFrame(data=test_col_1, columns=['sr_description'])

df_1['contains_str'] = df_1['sr_description'].str.contains(check_strs_re, case=False, na=False)

print(df_1)

Result:

              sr_description  contains_str
0   something with gross up.          True
1       without those words.         False
2                        NaN         False
3       or with Net to gross          True
4  if not then we give a "0"         False

Note that numpy isn't required for the solution to function, I'm just using it to test a NaN value.

Let me know if anything is unclear or your have any questions! :)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.