Scaling data subsetwise using R

Question

I Have the following dataset regarding a digital marketing problem. the size of the audience, no of inactive e mails read mails and the priority of the segment are given

 AudienceSize   inactiveemails  Readmails   Importanceof targetgroup
   246238       63581          1015         Low
    402042       609           2089         Medium
      2395        4              12         Medium
      10958       76             105        High
     120291     1237             707        Medium
      65199      0               544        Low
      106341    1506             1171       Medium
      496986    8501             3139      Medium
     293509     4805            2059        Medium
       93218     97              814       Medium
     246238     63581           1015          Low
      402042     609            2089        Medium
      2395        4              12         Medium
      10958       76             105        High
      120291    1237             707        Medium
      65199      0               544        Low
      106341    1506             1171      Medium
      496986    8501             3139      Medium
      293509    4805            2059       Medium
      93218     97               814       Medium

I need to scale the data. The low priority e mails should be scaled among members low category alone. Similarly for the medium and high category, the scaling should be done using that alone. Is there anyway to achieve this.

Importanceoftargetgroup  AudienceSize Readmails Inactivemails
Low                          .03444     .5366     .7437
Low                          .03664     .7500     .8000
medium                        .7665      .4333    .6543
medium                        .7965      .5533    .7543

Note: DPLYR has helped me subset the data and get means, but I need the scaled versions.

What is the scaling factor for AudienceSize, Readmails and Inactivemails? — Dhiraj
– Dhiraj, Commented Nov 1, 2017 at 6:20
Individually, each should be scaled between 0 and 1. But it should be done such that- low priorioty is grouped between 0 and 1, medium priority between 0 and 1 and high priority between 0 and 1. — Vishnu Raghavan
– Vishnu Raghavan, Commented Nov 1, 2017 at 6:22
try library(dplyr); df %>% group_by(Importanceoftargetgroup) %>% mutate_each(funs(scale), AudienceSize, inactiveemails, Readmails) — Prem
– Prem, Commented Nov 1, 2017 at 6:23
It is a minmax normalization sum. We can also use z score. But the objective is that low priority e mail data columns are compared with low priority e mail values alone. similarly medium priority e mail data should be compared with itself. — Vishnu Raghavan
– Vishnu Raghavan, Commented Nov 1, 2017 at 6:23

Prem · Accepted Answer · 2017-11-01 06:36:18Z

3

You should get the desired result using

library(dplyr)
df %>%
  group_by(Importanceoftargetgroup) %>%
  mutate_each(funs(scale), AudienceSize, inactiveemails, Readmails)

answered Nov 1, 2017 at 6:36

Prem

12k1 gold badge21 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Vishnu Raghavan Over a year ago

Done. Sorry took a while to get back.

Collectives™ on Stack Overflow

Scaling data subsetwise using R

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related