0

I have a nested dataframe where one of the columns (Reviews) is a list containing lists (text, rating, date) as shown below in text format.

structure(list(Name = c("Afsondering Clinic", "The Local Choice Pharmacy Bergview"
), Reviews = list(structure(list(review_text = c("No Review Text", 
"No Review Text", "Given the poor standard of living in Eastern Cape - S.A not to mention the inefficiency in public sectors. This clinic truly thrives for excellence but must say: there is forever no medicine nor pills. Wonderful staff indeed, very helpful regardless of the state of affairs in the Makhoba village. Because of NO water NOR electricity, toilets don't flash etc 🙈. No play area for kids. Vaccines are done here."
), review_rating = c(5L, 5L, 4L), review_date = c("2020-07-03 07:12:13 +00:00", 
"2019-07-03 07:12:13 +00:00", "2019-07-03 07:12:13 +00:00")), class = "data.frame", row.names = c(NA, 
3L)), structure(list(review_text = c("Excellent service", "Went to Bergview Pharmacy looking for Liquid chlorophyll, I asked the lady who’s at the till on your way out, she’s light in complexion,had braids,her makeup done. What a rude and uncouth human being...", 
"A little on the expensive side but in general the staff that work's there are experienced and quick to answer and helpful. I will definitely recommend this pharmacy to all people.", 
"No Review Text", "Quick attendence friendly good carring"), review_rating = c(5L, 
1L, 5L, 5L, 5L), review_date = c("2024-05-03 07:12:15 +00:00", 
"2024-01-03 07:12:15 +00:00", "2023-11-03 07:12:15 +00:00", "2022-07-03 07:12:15 +00:00", 
"2021-07-03 07:12:15 +00:00")), class = "data.frame", row.names = c(NA, 
11L)))), row.names = 1:2, class = "data.frame")

I want to check if, for any place, there exists at least one review that is not "No Review Text" and then filter the dataframe to contain only those places. I am struggling to access these 'review_text' elements without using sapply. How could I get direct access to these so I can do this check? Here is the code I want to use it in:

facilities_with_reviews <- review_data %>% 
  filter(!is.na(Information$rating)) %>%
  filter(path to review_text != "No Review Text")

PS I have attempted using plyr $ syntax and normal [] but I can't get it to work

5
  • 1
    You make it require extra effort from people to help you when you dont provide some small representative data to make your problem reproducible; code is text and its better to share formatted code than screenshots of code. Commented Jul 5, 2024 at 14:23
  • @NirGraham sorry for the images I am aware giving code is easier but this is a file I have read into r and I am not sure how to recreate it since I do not know how the nesting works. That is why I am asking for help Commented Jul 5, 2024 at 14:36
  • at core you have a data.frame, use head() to limit yourself to however many rows to show the issue (maybe 3 would be enough) use select to reduce how many columns are involved, again maybe 3 or so would be enough. finally use dput() to get a textual representation of your tiny data.frame Commented Jul 5, 2024 at 14:40
  • from your screenshot you may not have a data.frame, but a 92 long list ? head would still work on that, you can give the first or top two entries Commented Jul 5, 2024 at 14:41
  • @NirGraham thank you for the help on that I have replaced the images with the text of the first 2 places with only the name and review columns Commented Jul 5, 2024 at 15:09

2 Answers 2

2

With provided data sample I'd just go for unnesting:

library(dplyr)
library(tidyr)

review_data |>
  unnest_wider(Reviews) |> 
  unnest_longer(starts_with("review_")) |> 
  filter(review_text != "No Review Text")
#> # A tibble: 5 × 4
#>   Name                               review_text       review_rating review_date
#>   <chr>                              <chr>                     <int> <chr>      
#> 1 Afsondering Clinic                 "Given the poor …             4 2019-07-03…
#> 2 The Local Choice Pharmacy Bergview "Excellent servi…             5 2024-05-03…
#> 3 The Local Choice Pharmacy Bergview "Went to Bergvie…             1 2024-01-03…
#> 4 The Local Choice Pharmacy Bergview "A little on the…             5 2023-11-03…
#> 5 The Local Choice Pharmacy Bergview "Quick attendenc…             5 2021-07-03…

Though note that the structure of the sample is not the same as in screenshots you had in previuos revision . If this dataset was originally a JSON or NDJSON, there's a good chance that (pre)processing JSON (e.g. with packages like jqr or rjsoncons or stream mode of jsonlite ) would be better suited here.

Sign up to request clarification or add additional context in comments.

Comments

1

I think the relevant tidyverse trick is to use rowwise grouping with mutate as the data.frames are nested into the rowlevel of the given data. You might go a different route and unnest , but if you keep the nesting you can still extract relevant info and do all kinds of operations example :

mutate(rowwise(some_data),
  reviews_df_unique_review_text_count = length(unique(Reviews$review_text)),
  reviews_count_after_exclude_no_review_text = length(setdiff(unique(Reviews$review_text), "No Review Text")),
  list_of_review_texts = list(setdiff(unique(Reviews$review_text), "No Review Text")),
  review_texts_pasted_together = paste(setdiff(unique(Reviews$review_text), "No Review Text"), collapse = "; ")
)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.