1

I Have the following dataset regarding a digital marketing problem. the size of the audience, no of inactive e mails read mails and the priority of the segment are given

 AudienceSize   inactiveemails  Readmails   Importanceof targetgroup
   246238       63581          1015         Low
    402042       609           2089         Medium
      2395        4              12         Medium
      10958       76             105        High
     120291     1237             707        Medium
      65199      0               544        Low
      106341    1506             1171       Medium
      496986    8501             3139      Medium
     293509     4805            2059        Medium
       93218     97              814       Medium
     246238     63581           1015          Low
      402042     609            2089        Medium
      2395        4              12         Medium
      10958       76             105        High
      120291    1237             707        Medium
      65199      0               544        Low
      106341    1506             1171      Medium
      496986    8501             3139      Medium
      293509    4805            2059       Medium
      93218     97               814       Medium

I need to scale the data. The low priority e mails should be scaled among members low category alone. Similarly for the medium and high category, the scaling should be done using that alone. Is there anyway to achieve this.

Importanceoftargetgroup  AudienceSize Readmails Inactivemails
Low                          .03444     .5366     .7437
Low                          .03664     .7500     .8000
medium                        .7665      .4333    .6543
medium                        .7965      .5533    .7543

Note: DPLYR has helped me subset the data and get means, but I need the scaled versions.

7
  • 1
    What is the scaling factor for AudienceSize, Readmails and Inactivemails? Commented Nov 1, 2017 at 6:20
  • Individually, each should be scaled between 0 and 1. But it should be done such that- low priorioty is grouped between 0 and 1, medium priority between 0 and 1 and high priority between 0 and 1. Commented Nov 1, 2017 at 6:22
  • 1
    try library(dplyr); df %>% group_by(Importanceoftargetgroup) %>% mutate_each(funs(scale), AudienceSize, inactiveemails, Readmails) Commented Nov 1, 2017 at 6:23
  • It is a minmax normalization sum. We can also use z score. But the objective is that low priority e mail data columns are compared with low priority e mail values alone. similarly medium priority e mail data should be compared with itself. Commented Nov 1, 2017 at 6:23
  • 1
    It seems to work sir. Thank you. Commented Nov 1, 2017 at 6:29

1 Answer 1

3

You should get the desired result using

library(dplyr)
df %>%
  group_by(Importanceoftargetgroup) %>%
  mutate_each(funs(scale), AudienceSize, inactiveemails, Readmails)
Sign up to request clarification or add additional context in comments.

1 Comment

Done. Sorry took a while to get back.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.