I got a dataset composed by 2 columns. The "WEBDATA" column contains a list in each cell. This is the first time I have to deal with a dataset that contains lists and I m stuck...
My dataset looks like this:
WORD | WEBDATA
Home | list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9
Baby | list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9
Dog | list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9
Food | list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9
When I m checking the content inside each cell of the WEBDATA column, it returns me this:
> dataset$WEBDATA[[1]]
Domain
1 website1.com
2 mysuperwebsite.com
3 bestwebsite.uk
Url
1 https://www.website1.com/product2/
2 https://www.mysuperwebsite.com/productB/
3 https://www.bestwebsite.uk/product67/
To be sure it was lists and to check what it looks like, I tried this:
class(dataset$WEBDATA)
[1] "list"
testdataset <- data.frame(dataset$WEBDATA[[2]])
Domain | Url
1 website1.com | https://www.website1.com/product2/
2 mysuperwebsite.com | https://www.mysuperwebsite.com/productB/
3 bestwebsite.uk | https://www.bestwebsite.uk/product67/
My goal is to split the WEBDATA lists into several rows.
The final dataset should look like this:
WORD | Number | Domain | Url
Home | 1 | website1.com | https://www.website1.com/product2/
Home | 2 | mysuperwebsite.com | https://www.mysuperwebsite.com/productB/
Home | 3 | bestwebsite.uk | https://www.bestwebsite.uk/product67/
Baby | 1 | websitezz.uk | https://www.websitezz.uk/page/
Baby | 2 | websiteabc.com | https://www.websiteabc.com/post/
Baby | 3 | thewebsite.com | https://www.thewebsite.com/post75/
I thought of the strsplit() function but with lists I don't really know how to make it. Can you please help?
Here is a sample dataset, you can paste it in R:
theDataReconstituted <- structure(list(
WORD = structure(c(8L, 7L, 6L, 10L, 9L), .Label = c("dog dood", "dog foo", "dog food uk", "dog foof", "dogfood", "burns dog food", "canagan dog food", "dog food", "skinners dog food", "wainwrights dog food" ), class = "factor"),
WEBDATA = list(
structure(list(
Domain = structure(c(1L, 2L, 2L), .Label = c("pet-supermarket.co.uk", "petsathome.com" ), class = "factor"),
Url = structure(c(3L, 1L, 2L), .Label = c("petsathome.com/shop/en/pets/dog/dog-food-and-treats", "petsathome.com/shop/en/pets/dog/dog-food-and-treats/dry-dog-food", "pet-supermarket.co.uk/Dog/Dog-Food-Treats/Dog-Food/c/PSGB00070" ), class = "factor")),
.Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)),
structure(list(
Domain = structure(c(1L, 1L, 1L), .Label = "canagan.co.uk", class = "factor"),
Url = structure(c(1L, 3L, 2L), .Label = c("canagan.co.uk/", "canagan.co.uk/products-cat.html", "canagan.co.uk/products.html" ), class = "factor")),
.Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)),
structure(list(
Domain = structure(c(1L, 1L, 2L), .Label = c("burnspet.co.uk", "petsathome.com"), class = "factor"),
Url = structure(1:3, .Label = c("burnspet.co.uk/", "burnspet.co.uk/burns-dog-food-products/", "petsathome.com/shop/en/pets/merch-groups/burns" ), class = "factor")),
.Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)),
structure(list(
Domain = structure(c(1L, 1L, 1L), .Label = "petsathome.com", class = "factor"),
Url = structure(c(2L, 3L, 1L), .Label = c("petsathome.com/shop/en/pets/merch-groups/feature/wainwrights-dog-food", "petsathome.com/shop/en/pets/merch-groups/mg-004", "petsathome.com/shop/en/pets/merch-groups/wainwrights-dog-" ), class = "factor")),
.Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)),
structure(list(
Domain = structure(c(1L, 1L, 1L), .Label = "skinnerspetfoods.co.uk", class = "factor"),
Url = structure(c(1L, 3L, 2L), .Label = c("skinnerspetfoods.co.uk/", "skinnerspetfoods.co.uk/our-range/", "skinnerspetfoods.co.uk/product-category/field-trial-range/" ), class = "factor")),
.Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)))),
row.names = c(NA, -5L),
class = c("tbl_df", "tbl", "data.frame" ),
.Names = c("WORD", "WEBDATA"))
dputon a representative sample of your data? You've got nested list columns such that it's not really possible for anyone to reproduce the precise situation otherwise.HomewithWebsite1.cometc? It seems those sites belong to 2nd item that is associated withBaby.dput(dataset)(perhaps the subset in your post).library(tidyverse); theDataReconstituted %>% unnest() %>% group_by(WORD) %>% mutate(Number = row_number())will work. You'll get some errors about coercing factors to character, but it's not causing any problems.