Create new column based on another variable

Question

I have a dataframe with several columns. One of them is the column participant, where different participant codes are listed. These are all either in the 100 range, the 200 range or the 500 range. For example: 101, 203, 209, 504, 103, 512 and so on.

I want to create an extra column in the dataframe called group with 3 possible values: 100, 200 and 500. Thus, depending on the number a participant code starts with, it will be assigned one of these 3 labels.

I have tried using a combination of startsWith() and ifelse statements, but I can't make it work.

data$group = ifelse(startsWith(as.character(data$participant), "1"), "100", 
                    ((ifelse(startsWith(as.character(data$participant), "2"), "200",
                           (ifelse(startsWith(as.character(data$participant), "5"), "500")), NULL)))

Damian · Accepted Answer · 2021-04-02 20:02:42Z

Based on your examples and comments it looks like you want to divide a numeric value into ranges and assign a character label.

case_when provides a straightforward option. It takes longer to type, but it may be more readable for people unfamiliar with cut or more mathematical approaches.

tibble(old = c(101, 203, 209, 504, 103, 512)) %>%
    mutate(
        new = case_when(
            old < 100 ~ NA_character_,
            old < 200 ~ "100",
            old < 300 ~ "200",
            old < 400 ~ "300",
            old < 500 ~ "400",
            old < 600 ~ "500",
            TRUE ~ NA_character_
        )
    )

Result

# A tibble: 6 x 2
    old new  
  <dbl> <chr>
1   101 100  
2   203 200  
3   209 200  
4   504 500  
5   103 100  
6   512 500

That said, the cut function was designed to do precisely what you described, and has an option to specify the output labels.

old <- c(101, 203, 209, 504, 103, 512)

new <- cut(
    x = old, 
    breaks = seq(from = 100, to = 600, by = 100), 
    labels = seq(from = 100, to = 500, by = 100)
)

as.character(new)

Result

[1] "100" "200" "200" "500" "100" "500"

Alvaro Morales · Accepted Answer · 2021-04-04 03:58:16Z

2

simple tidyverse solution (similar to s__ soluiton.)

tibble(
participant = c(101, 203, 209, 504, 103, 512),
group = round(participant, -2)
)

# A tibble: 6 x 2
  participant group
        <dbl> <dbl>
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

edited Apr 4, 2021 at 3:58

answered Apr 2, 2021 at 18:52

Alvaro Morales

1,9654 gold badges16 silver badges25 bronze badges

Comments

akrun · Accepted Answer · 2021-04-02 18:07:24Z

1

May be this can be done more easily

(data$participant %/% 100) * 100
#[1] 100 200 200 500 100 500

In the OP's code, the last 'no' should be NA_character_ and not NULL as NULL returns with a length of 0. e.g.

 v1 <- c(10, 20, 5, 2, 40)
 ifelse(v1 > 50, 3, NULL)

Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL

ifelse(v1 > 50, 3, NA)
#[1] NA NA NA NA NA

data

data <- structure(list(participant = c(101, 203, 209, 504, 103, 512)), 
     class = "data.frame", row.names = c(NA, -6L))

edited Apr 2, 2021 at 18:07

answered Apr 2, 2021 at 17:53

akrun

891k38 gold badges590 silver badges700 bronze badges

Comments

s__ · Accepted Answer · 2021-04-02 17:56:31Z

1

You can manage it also with round():

x <- c(101, 203, 209, 504, 103, 512)
round(x, -2)
[1] 100 200 200 500 100 500

In you case:

data$group <- round(data$participant, -2)

answered Apr 2, 2021 at 17:56

s__

9,5433 gold badges30 silver badges48 bronze badges

Comments

Chris Ruehlemann · Accepted Answer · 2021-04-02 17:59:52Z

1

Using ifelse:

data$group <- ifelse(data$participant > 100 & data$participant <= 200, 100,
                     ifelse(data$participant > 200 & data$participant <= 300, 200, 500))

Result:

data
  participant group
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

answered Apr 2, 2021 at 17:59

Chris Ruehlemann

21.5k4 gold badges15 silver badges45 bronze badges

Comments

Anoushiravan R · Accepted Answer · 2021-04-02 18:27:00Z

1

It's rather verbose but it's just another way:

library(dplyr)

participant <- c(101, 203, 209, 504, 103, 512)

df <- tibble(participant)

df %>%
  mutate(group = case_when(
    participant %in% 100:199 ~ 100,
    participant %in% 200:299 ~ 200,
    participant %in% 500:599 ~ 500
  ))

# A tibble: 6 x 2
  participant group
        <dbl> <dbl>
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

answered Apr 2, 2021 at 18:27

Anoushiravan R

22k3 gold badges22 silver badges44 bronze badges

Comments

Chriss Paul · Accepted Answer · 2021-04-02 18:27:02Z

1

Another option in data.table you can try

library(data.table)
df <- data.table(participants=c(101, 203, 209, 504, 103, 512))
df[,groups:= (participants - participants%%100)]
   participants groups
1:          101    100
2:          203    200
3:          209    200
4:          504    500
5:          103    100
6:          512    500

Not exactly your answer but you can use cut function too, for instance, in data.table it may look like this:

library(data.table)

df <- data.table(participants = c(101, 203, 209, 504, 103, 512))
df[, groups:=cut(participants, seq(100,600,100))]

   participants    groups
1:          101 (100,200]
2:          203 (200,300]
3:          209 (200,300]
4:          504 (500,600]
5:          103 (100,200]
6:          512 (500,600]

edited Apr 2, 2021 at 18:27

answered Apr 2, 2021 at 18:21

Chriss Paul

1,1116 silver badges20 bronze badges

Collectives™ on Stack Overflow

Create new column based on another variable

7 Answers 7

Comments

Comments

data

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

Comments

data

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related