1

I have a World Indicator dataset that has this format

country     year    indicatorName       value
USA         1970    Agricultural Land   ...
USA         1970    Crop production     ...
...
USA         2000    Agricultural Land   ...
USA         2000    Crop production     ...
...
Mexico      1970    Agricultural Land   ...
Mexico      1970    Crop production     ...
...
Mexico      2000    Agricultural Land   ...
Mexico      2000    Crop production     ...

There are indicators here that I did not include, but these two are what I'm interested in. I want to divide the corresponding value of Crop production to Agricultural Land per country per year. Let's name the result crop_prod_density.

I do not know how to proceed from

df.groupby(['country', 'year'])

How to do it from here to result the following outputs:

  1. Add new row indicator

country year indicatorName value USA 1970 Agricultural Land ... USA 1970 Crop production ... USA 1970 crop_prod_density ...

  1. Add new column with same values for all rows for grouped (country, year)

country year indicatorName value crop_prod_density USA 1970 Agricultural Land ... us_value_1970 USA 1970 Crop production ... us_value_1970 ... Mexico 2000 Agricultural Land ... mx_value_2000 Mexico 2000 Crop production ... mx_value_2000

  1. New dataframe with only this column for values

country year crop_prod_density USA 1970 us_value_1970 ... USA 2000 us_value_2000 ... Mexico 1970 mx_value_1970 ... Mexico 2000 mx_value_2000

1 Answer 1

2

You can first reshape by set_index with unstack and then divide by div:

print (df)
  country  year      indicatorName  value
0     USA  1970  Agricultural Land     10
1     USA  1970    Crop production      2
2     USA  2000  Agricultural Land     10
3     USA  2000    Crop production      3
4  Mexico  1970  Agricultural Land     10
5  Mexico  1970    Crop production      5
6  Mexico  2000  Agricultural Land     10
7  Mexico  2000    Crop production      4  

df = (df.set_index(['country','year','indicatorName'])['value']
       .unstack()
       .assign(crop_prod_density=lambda x: x['Crop production'].div(x['Agricultural Land'])))
print (df)
indicatorName  Agricultural Land  Crop production  crop_prod_density
country year                                                        
Mexico  1970                  10                5                0.5
        2000                  10                4                0.4
USA     1970                  10                2                0.2
        2000                  10                3                0.3

Then reshape back by stack:

df1 = df.stack().reset_index(name='value')
print (df1)
   country  year      indicatorName  value
0   Mexico  1970  Agricultural Land   10.0
1   Mexico  1970    Crop production    5.0
2   Mexico  1970  crop_prod_density    0.5
3   Mexico  2000  Agricultural Land   10.0
4   Mexico  2000    Crop production    4.0
5   Mexico  2000  crop_prod_density    0.4
6      USA  1970  Agricultural Land   10.0
7      USA  1970    Crop production    2.0
8      USA  1970  crop_prod_density    0.2
9      USA  2000  Agricultural Land   10.0
10     USA  2000    Crop production    3.0
11     USA  2000  crop_prod_density    0.3

For new column to original append to index new column, but last is necessary change order of columns by reindex:

df2 =(df.set_index(['crop_prod_density'], append=True)
        .stack()
        .reset_index(name='value')
        .reindex(columns=['country','year','indicatorName','value','crop_prod_density']))
print (df2)
  country  year      indicatorName  value  crop_prod_density
0  Mexico  1970  Agricultural Land     10                0.5
1  Mexico  1970    Crop production      5                0.5
2  Mexico  2000  Agricultural Land     10                0.4
3  Mexico  2000    Crop production      4                0.4
4     USA  1970  Agricultural Land     10                0.2
5     USA  1970    Crop production      2                0.2
6     USA  2000  Agricultural Land     10                0.3
7     USA  2000    Crop production      3                0.3

And last remove unnecessary columns and create columns from MultiIndex:

df3 = (df.drop(['Crop production','Agricultural Land'], axis=1)
        .reset_index()
        .rename_axis(None, 1))
print (df3)
  country  year  crop_prod_density
0  Mexico  1970                0.5
1  Mexico  2000                0.4
2     USA  1970                0.2
3     USA  2000                0.3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.