2

In an experiment, people had four candidates to choose from; sometimes they're male, other times they're female. In the below dataframe, C1 means Candidate 1, C2 means Candidate 2, and so on. F denotes female while M denotes male. A response of 1 indicates the person chose C1, a response of 2 indicates the person chose C2, and so on.

C1    C2    C3    C4    response
F     F     M     M     2
M     M     F     M     1

I want a new column "ChooseFemale" which equals to 1 if the candidate chose a female candidate, and zero otherwise. So the first row should have ChooseFemale equal to 1, while the second row should have ChooseFemale equal to zero.

This would require me to look up a certain column depending on the value of "response" column.

How can I do this?

1

6 Answers 6

2

Another base R solution:

x <- df[["response"]]

df$ChooseFemale <- as.integer(df[cbind(seq_along(x), x)] == "F")
  C1 C2 C3 C4 response ChooseFemale
1  F  F  M  M        2            1
2  M  M  F  M        1            0

Data:

Lines <- "C1    C2    C3    C4    response
F     F     M     M     2
M     M     F     M     1"

df <- read.table(text = Lines, header = TRUE, stringsAsFactors = FALSE)
Sign up to request clarification or add additional context in comments.

Comments

0
# create dataframe
my.df <- data.frame(c1=c('f','m'),
                    c2=c('f','m'),
                    c3=c('m','f'),
                    c4=c('m','m'),
                    resp=c(2, 1))

# add column
my.df$ChooseFemale <- NA

# loop over rows
for (row in 1:nrow(my.df)){

  # extract the column to check from response column
  col <- paste0('c', my.df$resp[row])

  # fill in new column
  my.df$ChooseFemale[row] <- ifelse(my.df[row, col]=='f', 1, 0)
}

2 Comments

Maybe, but the solution is indifferent to the number of candidates and does not rely on other packages.
It's more or less equivalent to @A. Suliman's answer. both less fast than tyluRp's solution, but when speed is not a big concern for loop are convenient if you like verbose comments.
0
apply(df,1,function(x) ifelse(df[,as.numeric(x['response'])]=='F',1,0))[,1]
[1] 1 0

Here is the basic idea, select the column using the value in response. Then use apply with MARGIN=1 to apply this function row by row.

df[1,'response']
[1] 2

df[1,df[1,'response']]
[1] F
Levels: F M

data

df <- read.table(text = "
  C1    C2    C3    C4    response
   F     F     M     M     2
   M     M     F     M     1
",header=T)

Comments

0

You can create a simple function to check whether the response number matches "F", and then apply it to each row at once.

A tidyverse approach:

library(tidyverse)

mydata <- data.frame(C1=sample(c("F","M"),10,replace = T),
                     C2=sample(c("F","M"),10,replace = T),
                     C3=sample(c("F","M"),10,replace = T),
                     C4=sample(c("F","M"),10,replace = T),
                     response=sample(c(1:4),10,replace = T),
                     stringsAsFactors = FALSE)

   C1 C2 C3 C4 response
1   M  M  M  M        1
2   F  F  F  M        4
3   M  F  M  M        2
4   F  M  M  F        2
5   M  M  M  F        1
6   M  F  M  F        4
7   M  M  M  F        3
8   M  M  M  M        2
9   M  F  M  M        3
10  F  F  M  F        4

Custom function to check if the response matches "F"

female_choice <- function(C1, C2, C3, C4, response) {

    c(C1, C2, C3, C4)[response] == "F"

}   

And then just use mutate() to modify your dataframe, and pmap() to use its rows, one by one, as the set of arguments for female_choice()

mydata %>% 
    mutate(ChooseFemale = pmap_chr(., female_choice))

   C1 C2 C3 C4 response ChooseFemale
1   M  M  M  M        1        FALSE
2   F  F  F  M        4        FALSE
3   M  F  M  M        2         TRUE
4   F  M  M  F        2        FALSE
5   M  M  M  F        1        FALSE
6   M  F  M  F        4         TRUE
7   M  M  M  F        3        FALSE
8   M  M  M  M        2        FALSE
9   M  F  M  M        3        FALSE
10  F  F  M  F        4         TRUE

Comments

0

Here is one way to do it using tidyverse packages. As specified in the question, this takes into account both which candidate was chosen (C1-C4) and sex of the candidate (F/M):

# loading needed libraries
library(tidyverse)

# data
df <- utils::read.table(text = "C1    C2    C3    C4    response
                 F     F     M     M     2
                 M     M     F     M     1", header = TRUE) %>%
  tibble::as_data_frame(x = .) %>%
  tibble::rowid_to_column(.)

# manipulation
dplyr::full_join(
# creating dataframe with the new chooseFemale variable
  x = df %>%
    tidyr::gather(
      data = .,
      key = "candidate",
      value = "choice",
      C1:C4
    ) %>%
    dplyr::mutate(choice_new = paste("C", response, sep = "")) %>%
# creating the needed column by checking both the candidate chosen and 
# the sex of the candidate
    dplyr::mutate(chooseFemale = dplyr::case_when((choice_new == candidate) &
                                                    (choice == "F") ~ 1,
                                                  (choice_new == candidate) &
                                                    (choice == "M") ~ 0
    )) %>%
    dplyr::select(.data = ., -choice_new) %>%
    tidyr::spread(data = ., key = candidate, value = choice) %>%
    dplyr::filter(.data = ., !is.na(chooseFemale)) %>%
    dplyr::select(.data = ., -c(C1:C4)),
# original dataframe
  y = df,
  by = c("rowid", "response")
) %>% # removing the redundant row id
  dplyr::select(.data = ., -rowid) %>% # rearranging the columns 
  dplyr::select(.data = ., C1:C4, response, chooseFemale)

#> # A tibble: 2 x 6
#>   C1    C2    C3    C4    response chooseFemale
#>   <fct> <fct> <fct> <fct>    <int>        <dbl>
#> 1 F     F     M     M            2            1
#> 2 M     M     F     M            1            0

Created on 2018-08-24 by the reprex package (v0.2.0.9000).

Comments

-1

I'll provide an answer in the tidyr format. Your data is in a "wide" format. This makes it very human readable, but not necessarily machine readable. The first step to making it more tidy is to convert the data to long format. In other words, let's transform the data so that we don't have to do calculations across multiple columns in a single row.

tidy format allows you to use grouping variables, create summaries, etc.

library(dplyr)
library(tidyr)

df <- data.frame(C1 = c("F","M"),
           C2 = c("F","M"),
           C3 = c("M","F"),
           C4 = c("M","M"),
           stringsAsFactors = FALSE)
> df
  C1 C2 C3 C4
1  F  F  M  M
2  M  M  F  M

Let's add an "id" field so we can keep track of each unique row. This is the same as the row number...but we are going to be converting the wide data to long data with different row numbers. Then use gather to convert from wide data to long data.

df_long <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4)
> df_long
  id key value
1  1  C1     F
2  2  C1     M
3  1  C2     F
4  2  C2     M
5  1  C3     M
6  2  C3     F
7  1  C4     M
8  2  C4     M

Now it is possible to use group_by() to group based on variables, perform summaries, etc.

For what you've asked you group by the id column and then perform calculations on the group. In this case we will take the sum of all values that are "F". Then we ungroup and spread back to the wide / human readable format.

df_long %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup()
> df_long
# A tibble: 8 x 4
     id key   value response
  <int> <chr> <chr>    <int>
1     1 C1    F            2
2     2 C1    M            1
3     1 C2    F            2
4     2 C2    M            1
5     1 C3    M            2
6     2 C3    F            1
7     1 C4    M            2
8     2 C4    M            1

To get the data back in wide format once you are done doing all calculations that you need in long format:

df <- df_long %>%
  spread(key,value) 
> df
# A tibble: 2 x 6
     id response C1    C2    C3    C4   
  <int>    <int> <chr> <chr> <chr> <chr>
1     1        2 F     F     M     M    
2     2        1 M     M     F     M

To get the data back in the order you had it:

df <- df %>%
  select(-id) %>%
  select(C1:C4,everything())
> df
# A tibble: 2 x 5
  C1    C2    C3    C4    response
  <chr> <chr> <chr> <chr>    <int>
1 F     F     M     M            2
2 M     M     F     M            1

You can of course use the pipes to do this all in one step.

df <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4) %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key,value) %>%
  select(-id) %>%
  select(C1:C4,everything())

3 Comments

How do you know about the id and group_by(id)? PO never said about id
They said they wanted to add up a row. I'm creating an id column that is just an id for the row. This is necessary to be able to group by the row in the wide data after it has been changed to long format. Instead of calling it id I could have called it row.
If they had another id that was already unique for each row this step would not have been necessary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.