1

So i've this sample dataframe:

      x_mean    x_min    x_max     y_mean     y_min     y_max
 1      85.6        3      264       75.7         3       240
 2     105.5        6      243       76.4         3       191
 3      95.8       19      287       48.4         8       134
 4      85.5       50      166       64.8        32       103
 5      55.9       24      117       46.7        19        77 


x_range = [list(range(0,50)),list(range(51,100)),list(range(101,250)),list(range(251,350)),list(range(351,430)),list(range(431,1000))]
y_range = [list(range(0,30)),list(range(31,60)),list(range(61,90)),list(range(91,120)),list(range(121,250)),list(range(251,2000))]


#here x = Any column with mean value (eg. x_mean or y_mean)
# y = x_range / y_range 

def min_max_range(x,y):
for a in y:
    if int(x) in a:
        min_val = min(a)
        max_val = max(a)+1
        return max_val - min_val

def min_range(x,y):
for a in y:
    if int(x) in a:
        min_val = min(a)
        return min_val

Now i want to apply these function min_max_range() and min_range() to column x_mean, y_mean to get new columns.

Like the function min_max_val is using column x_mean & the range x_range as the input to create column x_min_max_val , similarly column y_mean & the range y_range are used for the column y_min_max_val :

I can create each column one by one, by using these one liners, but i want to apply this to both column x_mean & y_mean columns in one go with a one liner.

df['x_min_max_val'] = df['x_mean'].apply(lambda x: min_max_range(x,x_range))
df['y_min_max_val'] = df['y_mean'].apply(lambda x: min_max_range(x,y_range))  

The resultant dataframe should look like this:

      x_mean    x_min    x_max     y_mean     y_min     y_max    x_min_max_val   y_min_max_val        x_min_val   y_min_val
1      85.6        3      264       75.7         3       240                49              29               51          61
2     105.5        6      243       76.4         3       191               149              29              101          91
3      95.8       19      287       48.4         8       134                49              29               51          91
4      85.5       50      166       64.8        32       103                49              29               51          61
5      55.9       24      117       46.7        19        77                49              29               51          31

I want to create these columns in one go, instead of creating one column ata time. How can i do this? Any suggestions? or something like this could work?

df.filter(regex='mean').apply(lambda x: min_max_range(x,x+'_range'))
8
  • are these the functions you are using or are they just for example? Commented Jan 12, 2020 at 0:50
  • I'm using them. @Datanovice Commented Jan 12, 2020 at 0:52
  • @astroluv it is hard to understand what you are after. so you want your function min_max_range to take input x_mean and x_range and spit out the columns specified? Commented Jan 12, 2020 at 0:58
  • currently, your min_max_range would return None as no values in y_mean are in x_mean ? also passing in columns results in an error Commented Jan 12, 2020 at 0:58
  • Also, how do you currently do it and how are you hoping to make it 'in one go' ? Commented Jan 12, 2020 at 0:59

1 Answer 1

1

This is the concept that you need to follow to make this happen. First you need to have your ranges stored in a dictionary to enable access to them through names.

range_dict = {}
range_dict['x_range'] = x_range
range_dict['y_range'] = y_range

Also, you need to have the columns that you need to do the calculation for in a list (or you can use regex to get those if they have a specific pattern)

mean_cols_list = ['x_mean', 'y_mean']

Now, to apply your function over all columns, you need to define a function like this

def min_max_calculator(df, range_dictionary, mean_columns_list):
    for i in range(len(mean_cols_list)):
        # this returns 'x_mean'
        current_column = mean_cols_list[i]
        # this returns 'x_min_max_value'
        output_col_name = current_column.replace('mean','min_max_value')
        # this returns 'x_range'
        range_name = current_column.replace('mean','range')
        # this returns the list of ranges for x_range
        range_list = range_dict[range_name]
        # This add the calculated column to the dataframe
        df[output_col_name] = df[current_column].apply(lambda x: min_max_range(x,range_list))
    return(df)

df_output = min_max_calculator(df, range_dict, mean_cols_list)
Sign up to request clarification or add additional context in comments.

1 Comment

How can i add an another column that will use the other columns to get a new column. x_new = df.x_min_max_val / ( df.x_max - df.x_min ) * (df.x_mean - df.x_min) + df.x_min_max_val

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.