So have I some code that creates a new dataframe that counts the number of unique states with at least 1 non null value and the total number of non null values grouped by year. Code works fine, but I want to modify it to include a new rows that lists the unique states.
This is my data:
year state var1 var2
0 2018 CA NaN 2
1 2018 TX 1 NaN
2 2018 FL NaN NaN
3 2018 AL 1 2
4 2018 AL NaN 1
6 2019 CA NaN NaN
7 2019 TX 1 1
8 2019 FL NaN NaN
9 2019 AL 2 1
10 2019 AK 2 NaN
This my current output:
2018 2019
var1
Number of unique states with at least 1 non-null: 2 3
Number of respondents with non-null var: 2 3
Average: 1 1
var2
Number of unique states with at least 1 non-null: 2 2
Number of respondents with non-null var: 3 2
Average: 1.5 1
This is the code that I'm working with:
c = df.groupby(['year', 'state']).count()
res = c.groupby('YEAR').agg([np.count_nonzero, sum]).T
res.index = res.index.set_levels(['Number of unique states with at least 1 non-null:',
'Number of respondents with non-null var:'], level=1)
z = res.swaplevel().T
res4 = pd.concat([z, pd.concat([z['Number of respondents with non-null var:'] /
z['Number of unique states with at least 1 non-null:']],
keys=['Average:'], axis=1),],
axis=1).T.swaplevel().sort_index()
And this is what I want the new output to look like:
2018 2019
var1
Number of unique states with at least 1 non-null: 2 3
Unique states with at least 1 non-null: [TX, AL] [TX, AL, AK]
Number of respondents with non-null var: 2 3
Average: 1 1
var2
Number of unique states with at least 1 non-null: 2 2
Unique states with at least 1 non-null: [AL, CA] [TX, AL]
Number of respondents with non-null var: 3 2
Average: 1.5 1
Basically I want this row included "Unique states with at least 1 non-null:", listing the names of the states. How can I do this?