1

I have a dataframe with several columns. One of them is the column participant, where different participant codes are listed. These are all either in the 100 range, the 200 range or the 500 range. For example: 101, 203, 209, 504, 103, 512 and so on.

I want to create an extra column in the dataframe called group with 3 possible values: 100, 200 and 500. Thus, depending on the number a participant code starts with, it will be assigned one of these 3 labels.

I have tried using a combination of startsWith() and ifelse statements, but I can't make it work.

data$group = ifelse(startsWith(as.character(data$participant), "1"), "100", 
                    ((ifelse(startsWith(as.character(data$participant), "2"), "200",
                           (ifelse(startsWith(as.character(data$participant), "5"), "500")), NULL)))
0

7 Answers 7

2

Based on your examples and comments it looks like you want to divide a numeric value into ranges and assign a character label.

case_when provides a straightforward option. It takes longer to type, but it may be more readable for people unfamiliar with cut or more mathematical approaches.

tibble(old = c(101, 203, 209, 504, 103, 512)) %>%
    mutate(
        new = case_when(
            old < 100 ~ NA_character_,
            old < 200 ~ "100",
            old < 300 ~ "200",
            old < 400 ~ "300",
            old < 500 ~ "400",
            old < 600 ~ "500",
            TRUE ~ NA_character_
        )
    )

Result

# A tibble: 6 x 2
    old new  
  <dbl> <chr>
1   101 100  
2   203 200  
3   209 200  
4   504 500  
5   103 100  
6   512 500 

That said, the cut function was designed to do precisely what you described, and has an option to specify the output labels.

old <- c(101, 203, 209, 504, 103, 512)

new <- cut(
    x = old, 
    breaks = seq(from = 100, to = 600, by = 100), 
    labels = seq(from = 100, to = 500, by = 100)
)

as.character(new)

Result

[1] "100" "200" "200" "500" "100" "500"
Sign up to request clarification or add additional context in comments.

Comments

2

simple tidyverse solution (similar to s__ soluiton.)

tibble(
participant = c(101, 203, 209, 504, 103, 512),
group = round(participant, -2)
)

# A tibble: 6 x 2
  participant group
        <dbl> <dbl>
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

Comments

1

May be this can be done more easily

(data$participant %/% 100) * 100
#[1] 100 200 200 500 100 500

In the OP's code, the last 'no' should be NA_character_ and not NULL as NULL returns with a length of 0. e.g.

 v1 <- c(10, 20, 5, 2, 40)
 ifelse(v1 > 50, 3, NULL)

Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL

ifelse(v1 > 50, 3, NA)
#[1] NA NA NA NA NA

data

data <- structure(list(participant = c(101, 203, 209, 504, 103, 512)), 
     class = "data.frame", row.names = c(NA, -6L))

Comments

1

You can manage it also with round():

x <- c(101, 203, 209, 504, 103, 512)
round(x, -2)
[1] 100 200 200 500 100 500

In you case:

data$group <- round(data$participant, -2)

Comments

1

Using ifelse:

data$group <- ifelse(data$participant > 100 & data$participant <= 200, 100,
                     ifelse(data$participant > 200 & data$participant <= 300, 200, 500))

Result:

data
  participant group
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

Comments

1

It's rather verbose but it's just another way:

library(dplyr)

participant <- c(101, 203, 209, 504, 103, 512)

df <- tibble(participant)

df %>%
  mutate(group = case_when(
    participant %in% 100:199 ~ 100,
    participant %in% 200:299 ~ 200,
    participant %in% 500:599 ~ 500
  ))

# A tibble: 6 x 2
  participant group
        <dbl> <dbl>
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

Comments

1

Another option in data.table you can try

library(data.table)
df <- data.table(participants=c(101, 203, 209, 504, 103, 512))
df[,groups:= (participants - participants%%100)]
   participants groups
1:          101    100
2:          203    200
3:          209    200
4:          504    500
5:          103    100
6:          512    500

Not exactly your answer but you can use cut function too, for instance, in data.table it may look like this:

library(data.table)

df <- data.table(participants = c(101, 203, 209, 504, 103, 512))
df[, groups:=cut(participants, seq(100,600,100))]

   participants    groups
1:          101 (100,200]
2:          203 (200,300]
3:          209 (200,300]
4:          504 (500,600]
5:          103 (100,200]
6:          512 (500,600]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.