I have dataframe like this:
Date ID Age Gender Fruits
1.1.19 1 50 F Apple
2.1.19 1 50 F Mango
2.1.19 1 50 F Orange
1.1.19 2 75 M Grapes
4.1.19 3 20 M Apple
4.1.19 3 20 M Grapes
for example i have two lists:
fruits_list = ['Apple', 'Mango', 'Orange', 'Grapes', 'Banana', 'Guava']
date_list = ['1.1.19', '2.1.19', '3.1.19', '4.1.19', '5.1.19', '6.1.19']
I want to convert the Fruit column into further columns which gives binary info yes/no for each person. And the missing date should be NaN. by using this
pd.get_dummies(df, columns=['Fruits'], prefix='', prefix_sep='').groupby('Date').max()
i will get this but i need all the elements which are in fruit_list and date_list
Date ID Age Gender Apple Mango Orange Grapes
1.1.19 1 50 F 1 0 0 0
1.1.19 2 75 M 0 0 0 1
2.1.19 1 50 F 0 1 1 0
3.1.19 NaN NaN NaN NaN NaN NaN NaN
4.1.19 3 20 M 1 0 0 1
Desired output would be like this.
Date ID Age Gender Apple Mango Orange Grapes Banana Guava sum
1.1.19 1 50 F 1 0 0 0 0 0 1
1.1.19 2 75 M 0 0 0 1 0 0 1
2.1.19 1 50 F 0 1 1 0 0 0 2
3.1.19 NaN NaN NaN 0 0 0 0 0 0 0
4.1.19 3 20 M 1 0 0 1 0 0 2
5.1.19 NaN NaN NaN 0 0 0 0 0 0 0
6.1.19 NaN NaN NaN 0 0 0 0 0 0 0