1

I don't know how to cut dynamic variables into groups within a for loop.

df is the data frame that contains clm_april - clm_sept variables that are double. Also I would like to put different names for new columns.

Groups which are different for every month looks like this:

> groups_april
 [1] "0"       "500"     "1000"    "1500"    "2000"    "3500"   
 [7] "4500"    "5000"    "9500"    "2000000"

The following does not work:

vector <- c("april", "may", "june", "july", "aug", "sept")

for (i in vector) {
  varname <- paste0("clm_", i)
  df <- df %>%
    mutate_(.dots = 
              cut(list(varname),
                  breaks = groups[i],
                  include.lowest = T,
                  dig.lab = 10))
}

I have also tried:

for (i in vector) {
  varname <- paste0("clm_", i)
  df <- df %>%
    mutate_(.dots = interp(~cut(list(varname),
                           breaks = groups[i],
                           include.lowest = T,
                           dig.lab = 10,  varname=as.name(varname))))
}

R throws the following error:

Error in mutate_impl(.data, dots) : 
  Evaluation error: 'x' must be numeric.

Can you help me solve this problem?

EDIT:

# A tibble: 10 x 6
   clm_april clm_may clm_june clm_july clm_aug clm_sept
       <dbl>   <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
 1         0       0        0        0     689        0
 2      2000       0     1000     1000    1000        0
 3      5000    1000     1000     1000    1500     1518
 4      1000    1069     1100     1200    2019     2000
 5       679     689     9000    10000   36681     2000
 6       800    1000     1000      657    1815     2500
7
  • 2
    try to give a piece of your data as an example, by using dput() or head() and pasting the result. Commented Feb 27, 2018 at 19:46
  • 1
    When asking for help, you should include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions Commented Feb 27, 2018 at 19:52
  • I edited the question. Anyhow, I think I have already included all necessary information you need. If you need something else, just tell me. Thank you` Commented Feb 27, 2018 at 19:59
  • df %>% mutate_all(cut, breaks = group, include.lowest = T, dig.lab = 10)? Commented Feb 27, 2018 at 20:07
  • 1
    Am I the only person who does not really understand this question?? I feel very stupid :( Commented Feb 28, 2018 at 0:57

1 Answer 1

1

One option could be using map2_df from purrr package. The map2_df will allow use to pass column-wise data of actual data.frame. And along with pass the name of the column as well. A custom function find_breaks is used the column name to find corresponding breaks and then call cut.

Suppose different Groups are defined for each month is defined as:

groups_april <- c("0", "500", "1000",  "1500", "2000", "3500", "4500", "5000", "9500", "2000000")
groups_may <- c("0", "500", "1000",  "1500", "2000", "3500", "4500", "5000", "9500", "1000000")
groups_june <- c("0", "500", "1000",  "1500", "2000", "3500", "4500", "5000", "20000", "2000000")
groups_july <- c("0", "500", "1000",  "1500", "2000", "3500", "4500", "7000", "9500", "2000000")
groups_aug <- c("0", "500", "1000",  "1500", "2000", "3500", "4500", "6000", "9500", "2000000")
groups_sept <- c("0", "500", "1000",  "1500", "2000", "3500", "4500", "5000", "9500", "20000")


#Lets create a data.frame for Groups:
groups_df <- data.frame(groups_april, groups_may, groups_june, groups_july,
       groups_aug, groups_sept, stringsAsFactors = FALSE)


# Create a function to find breaks and apply cut
find_breaks <- function(x, y){
  #get the corresponding column name
  breaks_group_col <- grep(gsub("clm_", "", y), names(groups_df), value = TRUE)

  #apply cut using corresponding column from groups_df
  cut(x, breaks = groups_df[,breaks_group_col], include.lowest = TRUE, dig.lab = 10)
}

library(purrr)

# Pass df and column names of df to map2. 
map2_df(df, grep("clm_*", names(df), value = TRUE), function(.x, .y) find_breaks(.x, .y))

#Result
  clm_april   clm_may     clm_june     clm_july       clm_aug        clm_sept   
  <fctr>      <fctr>      <fctr>       <fctr>         <fctr>         <fctr>     
1 [0,500]     [0,500]     [0,500]      [0,500]        (500,1000]     [0,500]    
2 (1500,2000] [0,500]     (500,1000]   (500,1000]     (500,1000]     [0,500]    
3 (4500,5000] (500,1000]  (500,1000]   (500,1000]     (1000,1500]    (1500,2000]
4 (500,1000]  (1000,1500] (1000,1500]  (1000,1500]    (2000,3500]    (1500,2000]
5 (500,1000]  (500,1000]  (5000,20000] (9500,2000000] (9500,2000000] (1500,2000]
6 (500,1000]  (500,1000]  (500,1000]   (500,1000]     (1500,2000]    (2000,3500]

Data

df <- read.table(text = "clm_april clm_may clm_june clm_july clm_aug clm_sept
1         0       0        0        0     689        0
2      2000       0     1000     1000    1000        0
3      5000    1000     1000     1000    1500     1518
4      1000    1069     1100     1200    2019     2000
5       679     689     9000    10000   36681     2000
6       800    1000     1000      657    1815     2500", header = T, stringsAsFactors = F)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.