Check for text in nested lists in R [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed last year.

Improve this question

I have a nested dataframe where one of the columns (Reviews) is a list containing lists (text, rating, date) as shown below in text format.

structure(list(Name = c("Afsondering Clinic", "The Local Choice Pharmacy Bergview"
), Reviews = list(structure(list(review_text = c("No Review Text", 
"No Review Text", "Given the poor standard of living in Eastern Cape - S.A not to mention the inefficiency in public sectors. This clinic truly thrives for excellence but must say: there is forever no medicine nor pills. Wonderful staff indeed, very helpful regardless of the state of affairs in the Makhoba village. Because of NO water NOR electricity, toilets don't flash etc 🙈. No play area for kids. Vaccines are done here."
), review_rating = c(5L, 5L, 4L), review_date = c("2020-07-03 07:12:13 +00:00", 
"2019-07-03 07:12:13 +00:00", "2019-07-03 07:12:13 +00:00")), class = "data.frame", row.names = c(NA, 
3L)), structure(list(review_text = c("Excellent service", "Went to Bergview Pharmacy looking for Liquid chlorophyll, I asked the lady who’s at the till on your way out, she’s light in complexion,had braids,her makeup done. What a rude and uncouth human being...", 
"A little on the expensive side but in general the staff that work's there are experienced and quick to answer and helpful. I will definitely recommend this pharmacy to all people.", 
"No Review Text", "Quick attendence friendly good carring"), review_rating = c(5L, 
1L, 5L, 5L, 5L), review_date = c("2024-05-03 07:12:15 +00:00", 
"2024-01-03 07:12:15 +00:00", "2023-11-03 07:12:15 +00:00", "2022-07-03 07:12:15 +00:00", 
"2021-07-03 07:12:15 +00:00")), class = "data.frame", row.names = c(NA, 
11L)))), row.names = 1:2, class = "data.frame")

I want to check if, for any place, there exists at least one review that is not "No Review Text" and then filter the dataframe to contain only those places. I am struggling to access these 'review_text' elements without using sapply. How could I get direct access to these so I can do this check? Here is the code I want to use it in:

facilities_with_reviews <- review_data %>% 
  filter(!is.na(Information$rating)) %>%
  filter(path to review_text != "No Review Text")

PS I have attempted using plyr $ syntax and normal [] but I can't get it to work

You make it require extra effort from people to help you when you dont provide some small representative data to make your problem reproducible; code is text and its better to share formatted code than screenshots of code. — Nir Graham
– Nir Graham, Commented Jul 5, 2024 at 14:23
@NirGraham sorry for the images I am aware giving code is easier but this is a file I have read into r and I am not sure how to recreate it since I do not know how the nesting works. That is why I am asking for help — shongololo
– shongololo, Commented Jul 5, 2024 at 14:36
at core you have a data.frame, use head() to limit yourself to however many rows to show the issue (maybe 3 would be enough) use select to reduce how many columns are involved, again maybe 3 or so would be enough. finally use dput() to get a textual representation of your tiny data.frame — Nir Graham
– Nir Graham, Commented Jul 5, 2024 at 14:40
from your screenshot you may not have a data.frame, but a 92 long list ? head would still work on that, you can give the first or top two entries — Nir Graham
– Nir Graham, Commented Jul 5, 2024 at 14:41
@NirGraham thank you for the help on that I have replaced the images with the text of the first 2 places with only the name and review columns — shongololo
– shongololo, Commented Jul 5, 2024 at 15:09

margusl · Accepted Answer · 2024-07-05 16:16:56Z

With provided data sample I'd just go for unnesting:

library(dplyr)
library(tidyr)

review_data |>
  unnest_wider(Reviews) |> 
  unnest_longer(starts_with("review_")) |> 
  filter(review_text != "No Review Text")
#> # A tibble: 5 × 4
#>   Name                               review_text       review_rating review_date
#>   <chr>                              <chr>                     <int> <chr>      
#> 1 Afsondering Clinic                 "Given the poor …             4 2019-07-03…
#> 2 The Local Choice Pharmacy Bergview "Excellent servi…             5 2024-05-03…
#> 3 The Local Choice Pharmacy Bergview "Went to Bergvie…             1 2024-01-03…
#> 4 The Local Choice Pharmacy Bergview "A little on the…             5 2023-11-03…
#> 5 The Local Choice Pharmacy Bergview "Quick attendenc…             5 2021-07-03…

Though note that the structure of the sample is not the same as in screenshots you had in previuos revision . If this dataset was originally a JSON or NDJSON, there's a good chance that (pre)processing JSON (e.g. with packages like jqr or rjsoncons or stream mode of jsonlite ) would be better suited here.

Nir Graham · Accepted Answer · 2024-07-05 15:42:17Z

1

I think the relevant tidyverse trick is to use rowwise grouping with mutate as the data.frames are nested into the rowlevel of the given data. You might go a different route and unnest , but if you keep the nesting you can still extract relevant info and do all kinds of operations example :

mutate(rowwise(some_data),
  reviews_df_unique_review_text_count = length(unique(Reviews$review_text)),
  reviews_count_after_exclude_no_review_text = length(setdiff(unique(Reviews$review_text), "No Review Text")),
  list_of_review_texts = list(setdiff(unique(Reviews$review_text), "No Review Text")),
  review_texts_pasted_together = paste(setdiff(unique(Reviews$review_text), "No Review Text"), collapse = "; ")
)

answered Jul 5, 2024 at 15:42

Nir Graham

5,1972 gold badges8 silver badges17 bronze badges

Collectives™ on Stack Overflow

Check for text in nested lists in R [closed]

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related