0

I have the following dataframe:

FOOD ID   DATE        PRICE     DES
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
3         1/25/2020     4       Tomato
3         1/25/2020     NA      Tomato
3         1/1/2019     NA       Tomato
3         1/1/2019     5       Tomato

I would need to replace (where/when possible) the NA values when a price for the same FOOD ID and same DATE is available. Expected output:

FOOD ID   DATE        PRICE     DES
1         1/1/2020     100      Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     100       Tuna
3         1/25/2020     4       Tomato
3         1/25/2020     4      Tomato
3         1/1/2019     5       Tomato
3         1/1/2019     5       Tomato

Without using a loop for, is there a way I could easily perform such task? I guess one way could be to use dplyr, group the data by FOOD ID and DATE and get an "average" PRICE, delete the PRICE column from the original dataframe, and finally merged the group data with the original dataframe, but this seems a odd way to do it. Thanks for the help.

1
  • Are you asking for an elegant, R-based way to fill in averages? Or are you asking about how to handle missing data in general, where a package like mice might be your best bet? Commented Nov 10, 2020 at 20:31

2 Answers 2

1
df %>%
   group_by(FOOD_ID, DATE)%>%
   fill(PRICE, .direction = 'updown')

# A tibble: 8 x 4
# Groups:   FOOD_ID, DATE [3]
  FOOD_ID DATE      PRICE DES   
    <int> <chr>     <int> <chr> 
1       1 1/1/2020    100 Tuna  
2       1 1/1/2020    100 Tuna  
3       1 1/1/2020    100 Tuna  
4       1 1/1/2020    100 Tuna  
5       3 1/25/2020     4 Tomato
6       3 1/25/2020     4 Tomato
7       3 1/1/2019      5 Tomato
8       3 1/1/2019      5 Tomato
Sign up to request clarification or add additional context in comments.

1 Comment

Wow, this is exactly what I was looking for. Thank you!
1

We can use the data itself to feed prices back in.

Data:

df <- read.table(header = TRUE, text= "FOOD_ID   DATE        PRICE     DES
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
3         1/25/2020     4       Tomato
3         1/25/2020     NA      Tomato
3         1/1/2019     NA       Tomato
3         1/1/2019     5       Tomato")

Find distinct prices for each product on each date.

prices <- df %>%
  filter(!is.na(PRICE)) %>%
  group_by(FOOD_ID, DATE, DES) %>%
  distinct(FOOD_ID, .keep_all = TRUE)

Join these prices back into the original dataframe, which will assign the prices for each day (I have removed the original price column because it feeds back in from the prices df.

new_df <- df %>%
  select(-PRICE) %>%
  left_join(prices, by = c('FOOD_ID', 'DATE', 'DES'))

Output of new_df:

  FOOD_ID      DATE    DES PRICE
1       1  1/1/2020   Tuna   100
2       1  1/1/2020   Tuna   100
3       1  1/1/2020   Tuna   100
4       1  1/1/2020   Tuna   100
5       3 1/25/2020 Tomato     4
6       3 1/25/2020 Tomato     4
7       3  1/1/2019 Tomato     5
8       3  1/1/2019 Tomato     5

1 Comment

Thank you! This is definitely helpful even though Onyambu solution is precisely what I was looking for.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.