2

I would need to create a new column in my data, which would be equal to 'tender' value in case the 'id' appears only once, and to the 'lot' value in case it does not. I cannot do it through anything concerning NA, since the data is incomplete and there is a lot of NAs in there. My idea was to do it that if 'id' is unique, then select

df <- data.frame('id'=c(1,1,2,3,3,4), 
                 'lot'=c(10,20,NA,40,50,NA), 'tender'=c(30,30,30,90,90,40))

A am expecting the output to be:

data.frame('id'=c(1,1,2,3,3,4), 'lot'=c(10,20,NA,40,50,NA), 
           'tender'=c(30,30,30,90,90,40),'price'=c(10,20,30,40,50,40))
7
  • You may need df %>% mutate(price = coalesce(lot, tender)) Commented Jul 15, 2019 at 17:24
  • Or if we go by the condition df %>% group_by(id) %>% mutate(price = case_when(n() ==1 ~ tender, TRUE ~ lot)) Commented Jul 15, 2019 at 17:27
  • Would not that just replace the NAs with the 'tender' value? I cannot do that, as I mentioned, there is a lot of NAs that are simply missing at random, not because the value of 'tender' belongs there. Commented Jul 15, 2019 at 17:29
  • I showeed two methods 1) based on the data, 2) based on the logic you mentioned Commented Jul 15, 2019 at 17:29
  • 1
    Yes sorry, the second one wasnt there when I started writing :D Commented Jul 15, 2019 at 17:31

3 Answers 3

2

Based on the condition, we can do a group by case_when

library(dplyr)
df %>% 
  group_by(id) %>%
  mutate(price = case_when(n() ==1 & is.na(lot) ~ tender, TRUE ~ lot))

With the OP's current example, coalesce would also work

df %>%
   mutate(price = coalesce(lot, tender))
Sign up to request clarification or add additional context in comments.

2 Comments

is there an advantage to using case_when here instead of ifelse, or is it just a case of preference?
@RussThomas According to ?case_when This function allows you to vectorise multiple if_else() statements.. But, that was not the reason I used it here. It was the first one my hands typed, so I followed it
2

We can do this:

df$price <- apply(df, 1, function(x) min(x["lot"], x["tender"], na.rm = TRUE))

Or in dplyr solution would be:

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(price = min(lot, tender, na.rm = TRUE))
# # A tibble: 6 x 4
# # Groups:   id [4]
#      id   lot tender price
#   <dbl> <dbl>  <dbl> <dbl>
# 1     1    10     30    10
# 2     1    20     30    20
# 3     2    NA     30    30
# 4     3    40     90    40
# 5     3    50     90    50
# 6     4    NA     40    40

2 Comments

What if there is an id with 1 row, and tender > lot (as it is for all rows in the example data)?
@IceCreamToucan I think tender is the cumulative (sort of) lot so I don't see the possibility of the case you presented. OP can verify/nullify this.
0

Based on this description, you can use an if statement on the group size with data.table

I would need to create a new column in my data, which would be equal to 'tender' value in case the 'id' appears only once, and to the 'lot' value in case it does not.

library(data.table)
setDT(df)

df[, price := if(.N == 1) tender else lot, by = id]
#    id lot tender price
# 1:  1  10     30    10
# 2:  1  20     30    20
# 3:  2  NA     30    30
# 4:  3  40     90    40
# 5:  3  50     90    50
# 6:  4  NA     40    40

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.