0

Hello so i have a dataframe that looks like this:

df= {'country_name':['Albania', 'Algeria', 'Andorra', 'Angola'],'commodity_code':['55','55','55','55'],'year':[2000,2000,2000,2000],'trade_value':[10000,12000,'NaN',105]}

Essentially, this is a long dataframe in which I have many countries from 2000 to 2020 and the trade value for commodity "55" and commodity "73". What I need is to eliminate those countries that never exported commodity 55 and commodity 73. I need to eliminate those countries that for every year for each commodity the column trade value equals 0, not NaN, thus they never exported the commodity.

Thanks.

1
  • Can you provide an example with countries that would be removed? And the matching expected output. Commented Apr 8, 2022 at 20:45

3 Answers 3

0
In [1]: import pandas as pd
   ...:
   ...: data = {
   ...:     'country_name': ['Albania', 'Algeria', 'Andorra', 'Angola'],
   ...:     'commodity_code': ['55', '55', '55', '55'],
   ...:     'year': [2000, 2000, 2000, 2000],
   ...:     'trade_value': [10000, 12000, 'NaN', 0],
   ...:     }
   ...: df = pd.DataFrame(data)

In [2]: cond = df["commodity_code"].isin(["55", "73"]) & df["trade_value"].isin([0, "NaN"])

In [3]: df.drop(df[cond].index)
Out[3]:
  country_name commodity_code  year trade_value
0      Albania             55  2000       10000
1      Algeria             55  2000       12000

In [4]:
Sign up to request clarification or add additional context in comments.

Comments

0

IIUC, you could do something like:

# ensure real NaN
df = df.replace('NaN', float('nan'))

mask = (((df['commodity_code'].isin(['55', '73'])
        &df['trade_value'].gt(0))
        |df['trade_value'].isna()
       ).groupby(df['country_name'])
        .transform('sum').gt(0)
        )

df[mask]

NB. The condition on the year in unclear, but you could add 'year' if needed: groupby(df['country_name', 'year'])

Comments

0

I hope I got the requirements right, Here is what I understood, You need to drop the rows where the commodity is 55 and 73 and at the same time, their trade-off is either 0 or NaN.

Following code might work.

import pandas as pd
df= {'country_name':['Albania', 'Algeria', 'Andorra', 'Angola'],
'commodity_code':['56','55','55','55'],
'year':[2000,2000,2000,2000],
'trade_value':[10000,12000,'NaN',105]}

df  = pd.DataFrame(df)

# if you to change your original data
df.drop(df[df.commodity_code.isin(['55', '73']) & df.trade_value.isin(['NaN', 0])].index, inplace = True) 

# In case you want the changes in separate dataframe
new = df.drop(df[df.commodity_code.isin(['55', '73']) & df.trade_value.isin(['NaN', 0])].index) 

# just convert it back into dictionary
df.to_dict("list") # OR new.to_dict("list") 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.