I would like to create a new column in Pandas data frame. The value for this column is decided by the values of other columns from the same row. Here are few examples for the data frame:
code_module code_presentation id_student id_site date sum_click
0 AAA 2013J 28400 546652 -10 4
1 AAA 2013J 28400 546652 -10 1
2 AAA 2013J 28400 546652 -10 1
3 AAA 2013J 28400 546614 -10 11
4 AAA 2013J 28400 546714 -10 1
......
I would like to add a column call date_cat which will be determined by date column shown above. The date ranges [-24, 269]. I have written a function which categorize date into different categories.
def add_date_cat_col(date):
neg_date_range = ["<-20", "-19~-10", "-9~-1"]
date_range = ["0~9", "10~19", "20~29", "30~39", "40~49", "50~59", "60~69", "70~79",
"80~89","90~99","100~109","110~119","120~129","130~139","140~149","150~159","160~169","170~179",
"180~189","190~199","200~209","210~219","220~229","230~239","240~249","250~259","260~269"]
if date <= -20:
return neg_date_range[0]
elif date <= -10:
return neg_date_range[1]
elif date <= -1:
return neg_date_range[2]
else:
return date_range[date / 10]
Here is the result after performing such task
code_module code_presentation id_student id_site date sum_click date_cat
0 AAA 2013J 28400 546652 -10 4 -19~-10
1 AAA 2013J 28400 546652 -10 1 -19~-10
2 AAA 2013J 28400 546652 -10 1 -19~-10
3 AAA 2013J 28400 546614 -10 11 -19~-10
4 AAA 2013J 28400 546714 -10 1 -19~-10
...
20 AAA 2013J 28948 573847 20 20 20~29
What is a good and easy way to complete this task? Any help is appreciated!
axis=1.df['date_cat'] = df['date'].apply(add_date_cat_col, axis=1).