Looking up a particular column in R depending on another column

Question

In an experiment, people had four candidates to choose from; sometimes they're male, other times they're female. In the below dataframe, C1 means Candidate 1, C2 means Candidate 2, and so on. F denotes female while M denotes male. A response of 1 indicates the person chose C1, a response of 2 indicates the person chose C2, and so on.

C1    C2    C3    C4    response
F     F     M     M     2
M     M     F     M     1

I want a new column "ChooseFemale" which equals to 1 if the candidate chose a female candidate, and zero otherwise. So the first row should have ChooseFemale equal to 1, while the second row should have ChooseFemale equal to zero.

This would require me to look up a certain column depending on the value of "response" column.

How can I do this?

What should I do when someone answers my question?

Roman
– Roman

2018-12-31 11:24:51 +00:00
Commented Dec 31, 2018 at 11:24 — Roman
– Roman, Commented Dec 31, 2018 at 11:24

tyluRp · Accepted Answer · 2018-08-24 15:46:13Z

2

Another base R solution:

x <- df[["response"]]

df$ChooseFemale <- as.integer(df[cbind(seq_along(x), x)] == "F")

  C1 C2 C3 C4 response ChooseFemale
1  F  F  M  M        2            1
2  M  M  F  M        1            0

Data:

Lines <- "C1    C2    C3    C4    response
F     F     M     M     2
M     M     F     M     1"

df <- read.table(text = Lines, header = TRUE, stringsAsFactors = FALSE)

answered Aug 24, 2018 at 15:46

tyluRp

4,7882 gold badges20 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

bobbel · Accepted Answer · 2018-08-24 14:54:03Z

0

# create dataframe
my.df <- data.frame(c1=c('f','m'),
                    c2=c('f','m'),
                    c3=c('m','f'),
                    c4=c('m','m'),
                    resp=c(2, 1))

# add column
my.df$ChooseFemale <- NA

# loop over rows
for (row in 1:nrow(my.df)){

  # extract the column to check from response column
  col <- paste0('c', my.df$resp[row])

  # fill in new column
  my.df$ChooseFemale[row] <- ifelse(my.df[row, col]=='f', 1, 0)
}

answered Aug 24, 2018 at 14:54

bobbel

2,0499 silver badges23 bronze badges

2 Comments

bobbel Over a year ago

Maybe, but the solution is indifferent to the number of candidates and does not rely on other packages.

moodymudskipper Over a year ago

It's more or less equivalent to @A. Suliman's answer. both less fast than tyluRp's solution, but when speed is not a big concern for loop are convenient if you like verbose comments.

A. Suliman · Accepted Answer · 2018-08-24 15:34:12Z

0

apply(df,1,function(x) ifelse(df[,as.numeric(x['response'])]=='F',1,0))[,1]
[1] 1 0

Here is the basic idea, select the column using the value in response. Then use apply with MARGIN=1 to apply this function row by row.

df[1,'response']
[1] 2

df[1,df[1,'response']]
[1] F
Levels: F M

data

df <- read.table(text = "
  C1    C2    C3    C4    response
   F     F     M     M     2
   M     M     F     M     1
",header=T)

edited Aug 24, 2018 at 15:34

answered Aug 24, 2018 at 15:22

A. Suliman

13.2k6 gold badges27 silver badges42 bronze badges

Comments

HAVB · Accepted Answer · 2018-08-24 18:43:09Z

You can create a simple function to check whether the response number matches "F", and then apply it to each row at once.

A tidyverse approach:

library(tidyverse)

mydata <- data.frame(C1=sample(c("F","M"),10,replace = T),
                     C2=sample(c("F","M"),10,replace = T),
                     C3=sample(c("F","M"),10,replace = T),
                     C4=sample(c("F","M"),10,replace = T),
                     response=sample(c(1:4),10,replace = T),
                     stringsAsFactors = FALSE)

   C1 C2 C3 C4 response
1   M  M  M  M        1
2   F  F  F  M        4
3   M  F  M  M        2
4   F  M  M  F        2
5   M  M  M  F        1
6   M  F  M  F        4
7   M  M  M  F        3
8   M  M  M  M        2
9   M  F  M  M        3
10  F  F  M  F        4

Custom function to check if the response matches "F"

female_choice <- function(C1, C2, C3, C4, response) {

    c(C1, C2, C3, C4)[response] == "F"

}

And then just use mutate() to modify your dataframe, and pmap() to use its rows, one by one, as the set of arguments for female_choice()

mydata %>% 
    mutate(ChooseFemale = pmap_chr(., female_choice))

   C1 C2 C3 C4 response ChooseFemale
1   M  M  M  M        1        FALSE
2   F  F  F  M        4        FALSE
3   M  F  M  M        2         TRUE
4   F  M  M  F        2        FALSE
5   M  M  M  F        1        FALSE
6   M  F  M  F        4         TRUE
7   M  M  M  F        3        FALSE
8   M  M  M  M        2        FALSE
9   M  F  M  M        3        FALSE
10  F  F  M  F        4         TRUE

Indrajeet Patil · Accepted Answer · 2018-08-24 19:07:59Z

Here is one way to do it using tidyverse packages. As specified in the question, this takes into account both which candidate was chosen (C1-C4) and sex of the candidate (F/M):

# loading needed libraries
library(tidyverse)

# data
df <- utils::read.table(text = "C1    C2    C3    C4    response
                 F     F     M     M     2
                 M     M     F     M     1", header = TRUE) %>%
  tibble::as_data_frame(x = .) %>%
  tibble::rowid_to_column(.)

# manipulation
dplyr::full_join(
# creating dataframe with the new chooseFemale variable
  x = df %>%
    tidyr::gather(
      data = .,
      key = "candidate",
      value = "choice",
      C1:C4
    ) %>%
    dplyr::mutate(choice_new = paste("C", response, sep = "")) %>%
# creating the needed column by checking both the candidate chosen and 
# the sex of the candidate
    dplyr::mutate(chooseFemale = dplyr::case_when((choice_new == candidate) &
                                                    (choice == "F") ~ 1,
                                                  (choice_new == candidate) &
                                                    (choice == "M") ~ 0
    )) %>%
    dplyr::select(.data = ., -choice_new) %>%
    tidyr::spread(data = ., key = candidate, value = choice) %>%
    dplyr::filter(.data = ., !is.na(chooseFemale)) %>%
    dplyr::select(.data = ., -c(C1:C4)),
# original dataframe
  y = df,
  by = c("rowid", "response")
) %>% # removing the redundant row id
  dplyr::select(.data = ., -rowid) %>% # rearranging the columns 
  dplyr::select(.data = ., C1:C4, response, chooseFemale)

#> # A tibble: 2 x 6
#>   C1    C2    C3    C4    response chooseFemale
#>   <fct> <fct> <fct> <fct>    <int>        <dbl>
#> 1 F     F     M     M            2            1
#> 2 M     M     F     M            1            0

Created on 2018-08-24 by the reprex package (v0.2.0.9000).

Adam Sampson · Accepted Answer · 2018-08-24 15:16:51Z

-1

I'll provide an answer in the tidyr format. Your data is in a "wide" format. This makes it very human readable, but not necessarily machine readable. The first step to making it more tidy is to convert the data to long format. In other words, let's transform the data so that we don't have to do calculations across multiple columns in a single row.

tidy format allows you to use grouping variables, create summaries, etc.

library(dplyr)
library(tidyr)

df <- data.frame(C1 = c("F","M"),
           C2 = c("F","M"),
           C3 = c("M","F"),
           C4 = c("M","M"),
           stringsAsFactors = FALSE)

> df
  C1 C2 C3 C4
1  F  F  M  M
2  M  M  F  M

Let's add an "id" field so we can keep track of each unique row. This is the same as the row number...but we are going to be converting the wide data to long data with different row numbers. Then use gather to convert from wide data to long data.

df_long <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4)

> df_long
  id key value
1  1  C1     F
2  2  C1     M
3  1  C2     F
4  2  C2     M
5  1  C3     M
6  2  C3     F
7  1  C4     M
8  2  C4     M

Now it is possible to use group_by() to group based on variables, perform summaries, etc.

For what you've asked you group by the id column and then perform calculations on the group. In this case we will take the sum of all values that are "F". Then we ungroup and spread back to the wide / human readable format.

df_long %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup()

> df_long
# A tibble: 8 x 4
     id key   value response
  <int> <chr> <chr>    <int>
1     1 C1    F            2
2     2 C1    M            1
3     1 C2    F            2
4     2 C2    M            1
5     1 C3    M            2
6     2 C3    F            1
7     1 C4    M            2
8     2 C4    M            1

To get the data back in wide format once you are done doing all calculations that you need in long format:

df <- df_long %>%
  spread(key,value)

> df
# A tibble: 2 x 6
     id response C1    C2    C3    C4   
  <int>    <int> <chr> <chr> <chr> <chr>
1     1        2 F     F     M     M    
2     2        1 M     M     F     M

To get the data back in the order you had it:

df <- df %>%
  select(-id) %>%
  select(C1:C4,everything())

> df
# A tibble: 2 x 5
  C1    C2    C3    C4    response
  <chr> <chr> <chr> <chr>    <int>
1 F     F     M     M            2
2 M     M     F     M            1

You can of course use the pipes to do this all in one step.

df <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4) %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key,value) %>%
  select(-id) %>%
  select(C1:C4,everything())

answered Aug 24, 2018 at 15:16

Adam Sampson

2,0411 gold badge9 silver badges18 bronze badges

3 Comments

Sal-laS Over a year ago

How do you know about the id and group_by(id)? PO never said about id

Adam Sampson Over a year ago

They said they wanted to add up a row. I'm creating an id column that is just an id for the row. This is necessary to be able to group by the row in the wide data after it has been changed to long format. Instead of calling it id I could have called it row.

Adam Sampson Over a year ago

If they had another id that was already unique for each row this step would not have been necessary.

Collectives™ on Stack Overflow

Looking up a particular column in R depending on another column

6 Answers 6

Comments

2 Comments

data

Comments

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

2 Comments

data

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related