Filtering by Strings in dataframe and adding separate values

Question

given this dataframe is it possible to look for particular strings such as the countries that are located inside the countries list? (For example for the first index in 'Country', it has the word Japan inside it and its corresponding value will be 1). Is it possible to sum up the value that corresponds to each country? (End result: Japan: 1+3=4 USA:2 Europe:4)

countries=["Europe","USA","Japan"]
df=pd.DataFrame={'Employees':[1,2,3,4],
                 'Country':['Japan;Security','USA;Google',"Japan;Sega","Europe;Google"]}
print(df)

Thanks

I'm on mobile, but here it goes. I would make a new column with .str.split(';')[0] then do a groupby on the new column, use .agg({'Employees':'sum'}). This is a classic use case of groupby, I strongly encourage you to read the docs. — likethevegetable
– likethevegetable, Commented May 1, 2021 at 13:32

Nk03 · Accepted Answer · 2021-05-01 13:50:36Z

2

If you wanna use only those values specified in the country list. You can do something like this -

patt = '(' + '|'.join(countries) + ')'
grp = df.Country.str.extract(pat=patt, expand=False).values
new_df = df.groupby(grp).agg({'Employees': sum})

For example, if the initial country list is missing 'JAPAN' -

countries = ["Europe", "USA"]
patt = '(' + '|'.join(countries) + ')'
grp = df.Country.str.extract(pat=patt, expand=False).values
new_df = df.groupby(grp, dropna=False).agg({'Employees': sum}).reset_index().rename(
    columns={'index': 'Country'}).fillna('other')

outptut-

  Country  Employees
0  Europe          4
1     USA          2
2   other          4 # see the change

edited May 1, 2021 at 13:50

answered May 1, 2021 at 13:44

Nk03

15k2 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

anky Over a year ago

This is the correct and generic way of including the words in the list IMO

Code Different · Accepted Answer · 2021-05-01 13:39:36Z

1

Try this:

c = df['Country'].str.split(';', expand=True)[0].to_numpy()
df.groupby(c)['Employees'].sum()

answered May 1, 2021 at 13:39

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Collectives™ on Stack Overflow

Filtering by Strings in dataframe and adding separate values

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related