R replace NA values in a dataframe column with existing values in other rows and same column

Question

I have the following dataframe:

FOOD ID   DATE        PRICE     DES
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
3         1/25/2020     4       Tomato
3         1/25/2020     NA      Tomato
3         1/1/2019     NA       Tomato
3         1/1/2019     5       Tomato

I would need to replace (where/when possible) the NA values when a price for the same FOOD ID and same DATE is available. Expected output:

FOOD ID   DATE        PRICE     DES
1         1/1/2020     100      Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     100       Tuna
3         1/25/2020     4       Tomato
3         1/25/2020     4      Tomato
3         1/1/2019     5       Tomato
3         1/1/2019     5       Tomato

Without using a loop for, is there a way I could easily perform such task? I guess one way could be to use dplyr, group the data by FOOD ID and DATE and get an "average" PRICE, delete the PRICE column from the original dataframe, and finally merged the group data with the original dataframe, but this seems a odd way to do it. Thanks for the help.

Are you asking for an elegant, R-based way to fill in averages? Or are you asking about how to handle missing data in general, where a package like mice might be your best bet? — Dubukay
– Dubukay, Commented Nov 10, 2020 at 20:31

Onyambu · Accepted Answer · 2020-11-10 23:34:35Z

1

df %>%
   group_by(FOOD_ID, DATE)%>%
   fill(PRICE, .direction = 'updown')

# A tibble: 8 x 4
# Groups:   FOOD_ID, DATE [3]
  FOOD_ID DATE      PRICE DES   
    <int> <chr>     <int> <chr> 
1       1 1/1/2020    100 Tuna  
2       1 1/1/2020    100 Tuna  
3       1 1/1/2020    100 Tuna  
4       1 1/1/2020    100 Tuna  
5       3 1/25/2020     4 Tomato
6       3 1/25/2020     4 Tomato
7       3 1/1/2019      5 Tomato
8       3 1/1/2019      5 Tomato

answered Nov 10, 2020 at 23:34

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Angelo Over a year ago

Wow, this is exactly what I was looking for. Thank you!

TTS · Accepted Answer · 2020-11-10 20:30:30Z

1

We can use the data itself to feed prices back in.

Data:

df <- read.table(header = TRUE, text= "FOOD_ID   DATE        PRICE     DES
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
1         1/1/2020     100      Tuna
1         1/1/2020     NA       Tuna
3         1/25/2020     4       Tomato
3         1/25/2020     NA      Tomato
3         1/1/2019     NA       Tomato
3         1/1/2019     5       Tomato")

Find distinct prices for each product on each date.

prices <- df %>%
  filter(!is.na(PRICE)) %>%
  group_by(FOOD_ID, DATE, DES) %>%
  distinct(FOOD_ID, .keep_all = TRUE)

Join these prices back into the original dataframe, which will assign the prices for each day (I have removed the original price column because it feeds back in from the prices df.

new_df <- df %>%
  select(-PRICE) %>%
  left_join(prices, by = c('FOOD_ID', 'DATE', 'DES'))

Output of new_df:

  FOOD_ID      DATE    DES PRICE
1       1  1/1/2020   Tuna   100
2       1  1/1/2020   Tuna   100
3       1  1/1/2020   Tuna   100
4       1  1/1/2020   Tuna   100
5       3 1/25/2020 Tomato     4
6       3 1/25/2020 Tomato     4
7       3  1/1/2019 Tomato     5
8       3  1/1/2019 Tomato     5

answered Nov 10, 2020 at 20:30

TTS

1,93810 silver badges19 bronze badges

1 Comment

Angelo Over a year ago

Thank you! This is definitely helpful even though Onyambu solution is precisely what I was looking for.

Collectives™ on Stack Overflow

R replace NA values in a dataframe column with existing values in other rows and same column

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related