How to eliminate certain rows from a dataframe

Question

Hello so i have a dataframe that looks like this:

df= {'country_name':['Albania', 'Algeria', 'Andorra', 'Angola'],'commodity_code':['55','55','55','55'],'year':[2000,2000,2000,2000],'trade_value':[10000,12000,'NaN',105]}

Essentially, this is a long dataframe in which I have many countries from 2000 to 2020 and the trade value for commodity "55" and commodity "73". What I need is to eliminate those countries that never exported commodity 55 and commodity 73. I need to eliminate those countries that for every year for each commodity the column trade value equals 0, not NaN, thus they never exported the commodity.

Thanks.

Can you provide an example with countries that would be removed? And the matching expected output. — mozway
– mozway, Commented Apr 8, 2022 at 20:45

Işık Kaplan · Accepted Answer · 2022-04-08 20:31:10Z

0

In [1]: import pandas as pd
   ...:
   ...: data = {
   ...:     'country_name': ['Albania', 'Algeria', 'Andorra', 'Angola'],
   ...:     'commodity_code': ['55', '55', '55', '55'],
   ...:     'year': [2000, 2000, 2000, 2000],
   ...:     'trade_value': [10000, 12000, 'NaN', 0],
   ...:     }
   ...: df = pd.DataFrame(data)

In [2]: cond = df["commodity_code"].isin(["55", "73"]) & df["trade_value"].isin([0, "NaN"])

In [3]: df.drop(df[cond].index)
Out[3]:
  country_name commodity_code  year trade_value
0      Albania             55  2000       10000
1      Algeria             55  2000       12000

In [4]:

answered Apr 8, 2022 at 20:31

Işık Kaplan

3,0422 gold badges16 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2022-04-08 20:52:13Z

0

IIUC, you could do something like:

# ensure real NaN
df = df.replace('NaN', float('nan'))

mask = (((df['commodity_code'].isin(['55', '73'])
        &df['trade_value'].gt(0))
        |df['trade_value'].isna()
       ).groupby(df['country_name'])
        .transform('sum').gt(0)
        )

df[mask]

NB. The condition on the year in unclear, but you could add 'year' if needed: groupby(df['country_name', 'year'])

answered Apr 8, 2022 at 20:52

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

MUsman · Accepted Answer · 2022-04-08 20:57:41Z

I hope I got the requirements right, Here is what I understood, You need to drop the rows where the commodity is 55 and 73 and at the same time, their trade-off is either 0 or NaN.

Following code might work.

import pandas as pd
df= {'country_name':['Albania', 'Algeria', 'Andorra', 'Angola'],
'commodity_code':['56','55','55','55'],
'year':[2000,2000,2000,2000],
'trade_value':[10000,12000,'NaN',105]}

df  = pd.DataFrame(df)

# if you to change your original data
df.drop(df[df.commodity_code.isin(['55', '73']) & df.trade_value.isin(['NaN', 0])].index, inplace = True) 

# In case you want the changes in separate dataframe
new = df.drop(df[df.commodity_code.isin(['55', '73']) & df.trade_value.isin(['NaN', 0])].index) 

# just convert it back into dictionary
df.to_dict("list") # OR new.to_dict("list")

Collectives™ on Stack Overflow

How to eliminate certain rows from a dataframe

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related