0

I would like to create a new column in Pandas data frame. The value for this column is decided by the values of other columns from the same row. Here are few examples for the data frame:

    code_module code_presentation   id_student  id_site date    sum_click
0   AAA         2013J               28400       546652  -10        4
1   AAA         2013J               28400       546652  -10        1
2   AAA         2013J               28400       546652  -10        1
3   AAA         2013J               28400       546614  -10        11
4   AAA         2013J               28400       546714  -10        1
......

I would like to add a column call date_cat which will be determined by date column shown above. The date ranges [-24, 269]. I have written a function which categorize date into different categories.

def add_date_cat_col(date):
    neg_date_range = ["<-20", "-19~-10", "-9~-1"]
    date_range = ["0~9", "10~19", "20~29", "30~39", "40~49", "50~59", "60~69", "70~79",
                 "80~89","90~99","100~109","110~119","120~129","130~139","140~149","150~159","160~169","170~179",
                "180~189","190~199","200~209","210~219","220~229","230~239","240~249","250~259","260~269"]
    if date <= -20:
        return neg_date_range[0]
    elif date <= -10:
        return neg_date_range[1]
    elif date <= -1:
        return neg_date_range[2]
    else:
        return date_range[date / 10]

Here is the result after performing such task

    code_module code_presentation   id_student  id_site date    sum_click    date_cat
0   AAA         2013J               28400       546652  -10        4         -19~-10
1   AAA         2013J               28400       546652  -10        1         -19~-10
2   AAA         2013J               28400       546652  -10        1         -19~-10
3   AAA         2013J               28400       546614  -10        11        -19~-10
4   AAA         2013J               28400       546714  -10        1         -19~-10
...
20  AAA         2013J               28948       573847  20         20        20~29

What is a good and easy way to complete this task? Any help is appreciated!

2
  • Looks like you want pandas.DataFrame.apply() by row. You can do that by specify axis=1. Commented Apr 5, 2021 at 5:12
  • 1
    df['date_cat'] = df['date'].apply(add_date_cat_col, axis=1). Commented Apr 5, 2021 at 5:15

2 Answers 2

2

You can use numpy.select for better performance:

In [1280]: import numpy as np

In [1281]: conds = [df['date'].le(-20), df['date'].le(-10), df['date'].le(-1)]
In [1281]: choices = ["<-20", "-19~-10", "-9~-1"]

In [1279]: df['date_cat'] = np.select(conds, choices, default=df['date'].astype(str) + '~' + df['date'].abs().add(9).astype(str))

In [1280]: df
Out[1280]: 
  code_module code_presentation  id_student  id_site  date  sum_click date_cat
0         AAA             2013J       28400   546652   -10          4  -19~-10
1         AAA             2013J       28400   546652   -10          1  -19~-10
2         AAA             2013J       28400   546652   -10          1  -19~-10
3         AAA             2013J       28400   546614   -10         11  -19~-10
4         AAA             2013J       28400   546714   -10          1  -19~-10
5         AAA             2013J       28948   573847    20         20    20~29
Sign up to request clarification or add additional context in comments.

Comments

0

If you mean how to simplify the date conversion logic, you can use some math logic to handle the case when date is greater than zero.

int(date // 10.0) * 10 will round down integers to the nearest 10.

def add_date_cat_col(date):
    neg_date_range = ["<-20", "-19~-10", "-9~-1"]

    if date <= -20:
        return neg_date_range[0]
    elif date <= -10:
        return neg_date_range[1]
    elif date <= -1:
        return neg_date_range[2]
    else:
        low = int(date // 10.0) * 10
        return f'{low}~{low+9}'

df['date_cat'] = df['date'].apply(add_date_cat_col, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.