Situation: I have a DataFrame with NaN values. I'm going to make a prognosis of sth for the next year, so I guess i don't need very old data. I want to check the 'structure' of NaNs to see if there are lots of them in old data, and not so much in the new ones.
df = pd.DataFrame(columns = ['year','water_consumption','some_index'],
data = [[1, float('nan'),3],[2,26,7], [5,float('nan'),6],
[1,float('nan'),42],[1,float('nan'),13]])
| A | B | C | |
|---|---|---|---|
| 0 | 1 | NaN | 3 |
| 1 | 2 | 26.0 | 7 |
| 2 | 5 | NaN | 6 |
| 3 | 1 | NaN | 42 |
| 4 | 1 | NaN | 13 |
The question is: how can I group number of NaN values in feature_one by values of feature_two easily (I know I can make a list and cycling by every value of feature_one and then count them, but I'd like to know if there is any easier and more elegant way)? Groupby has no 'isna()' method.
In the end i want to see a table like:
| A | B_nan_count | |
|---|---|---|
| 0 | 1 | 3 |
| 1 | 2 | 0 |
| 2 | 5 | 1 |