Split lists into several rows using R

Question

I got a dataset composed by 2 columns. The "WEBDATA" column contains a list in each cell. This is the first time I have to deal with a dataset that contains lists and I m stuck...

My dataset looks like this:

WORD  |   WEBDATA
Home  |   list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9
Baby  |   list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9
Dog   |   list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9
Food  |   list(Domain = c(77, 25, 7, 97, 71, 1, 42, 35, 37, 58, 9

When I m checking the content inside each cell of the WEBDATA column, it returns me this:

> dataset$WEBDATA[[1]]

   Domain
1  website1.com
2  mysuperwebsite.com
3  bestwebsite.uk

   Url
1  https://www.website1.com/product2/
2  https://www.mysuperwebsite.com/productB/
3  https://www.bestwebsite.uk/product67/

To be sure it was lists and to check what it looks like, I tried this:

class(dataset$WEBDATA)
[1] "list"

testdataset <- data.frame(dataset$WEBDATA[[2]])
    Domain              |  Url
1   website1.com        |  https://www.website1.com/product2/
2   mysuperwebsite.com  |  https://www.mysuperwebsite.com/productB/
3   bestwebsite.uk      |  https://www.bestwebsite.uk/product67/

My goal is to split the WEBDATA lists into several rows.

The final dataset should look like this:

WORD  |  Number |  Domain             |  Url
Home  |   1     |  website1.com       |  https://www.website1.com/product2/
Home  |   2     |  mysuperwebsite.com |  https://www.mysuperwebsite.com/productB/
Home  |   3     |  bestwebsite.uk     |  https://www.bestwebsite.uk/product67/
Baby  |   1     |  websitezz.uk       |  https://www.websitezz.uk/page/
Baby  |   2     |  websiteabc.com     |  https://www.websiteabc.com/post/
Baby  |   3     |  thewebsite.com     |  https://www.thewebsite.com/post75/

I thought of the strsplit() function but with lists I don't really know how to make it. Can you please help?

Here is a sample dataset, you can paste it in R:

theDataReconstituted <- structure(list(
    WORD = structure(c(8L, 7L, 6L, 10L, 9L), .Label = c("dog dood", "dog foo", "dog food uk", "dog foof", "dogfood", "burns dog food", "canagan dog food", "dog food", "skinners dog food", "wainwrights dog food" ), class = "factor"), 
    WEBDATA = list(
        structure(list(
            Domain = structure(c(1L, 2L, 2L), .Label = c("pet-supermarket.co.uk", "petsathome.com" ), class = "factor"), 
            Url = structure(c(3L, 1L, 2L), .Label = c("petsathome.com/shop/en/pets/dog/dog-food-and-treats", "petsathome.com/shop/en/pets/dog/dog-food-and-treats/dry-dog-food", "pet-supermarket.co.uk/Dog/Dog-Food-Treats/Dog-Food/c/PSGB00070" ), class = "factor")), 
            .Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)), 
        structure(list(
            Domain = structure(c(1L, 1L, 1L), .Label = "canagan.co.uk", class = "factor"), 
            Url = structure(c(1L, 3L, 2L), .Label = c("canagan.co.uk/", "canagan.co.uk/products-cat.html", "canagan.co.uk/products.html" ), class = "factor")), 
            .Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)), 
        structure(list(
            Domain = structure(c(1L, 1L, 2L), .Label = c("burnspet.co.uk", "petsathome.com"), class = "factor"), 
            Url = structure(1:3, .Label = c("burnspet.co.uk/", "burnspet.co.uk/burns-dog-food-products/", "petsathome.com/shop/en/pets/merch-groups/burns" ), class = "factor")), 
            .Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)), 
        structure(list(
            Domain = structure(c(1L, 1L, 1L), .Label = "petsathome.com", class = "factor"), 
            Url = structure(c(2L, 3L, 1L), .Label = c("petsathome.com/shop/en/pets/merch-groups/feature/wainwrights-dog-food", "petsathome.com/shop/en/pets/merch-groups/mg-004", "petsathome.com/shop/en/pets/merch-groups/wainwrights-dog-" ), class = "factor")), 
            .Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)), 
        structure(list(
            Domain = structure(c(1L, 1L, 1L), .Label = "skinnerspetfoods.co.uk", class = "factor"), 
            Url = structure(c(1L, 3L, 2L), .Label = c("skinnerspetfoods.co.uk/", "skinnerspetfoods.co.uk/our-range/", "skinnerspetfoods.co.uk/product-category/field-trial-range/" ), class = "factor")), 
            .Names = c("Domain", "Url"), class = "data.frame", row.names = c(NA, -3L)))), 
    row.names = c(NA, -5L), 
    class = c("tbl_df", "tbl", "data.frame" ), 
    .Names = c("WORD", "WEBDATA"))

Can you edit with the results of calling dput on a representative sample of your data? You've got nested list columns such that it's not really possible for anyone to reproduce the precise situation otherwise. — alistaire
– alistaire, Commented Dec 16, 2017 at 17:22
How are you relating Home with Website1.com etc? It seems those sites belong to 2nd item that is associated with Baby. — MKR
– MKR, Commented Dec 16, 2017 at 17:23
Website1.com is contained in the list on the same row as Home. Thanks for noticing it, there was a mistake in the code above, I edited it. — Remi
– Remi, Commented Dec 16, 2017 at 17:26
@Remi As @alistaire requested, please put a subset of the output of dput(dataset) (perhaps the subset in your post). — steveb
– steveb, Commented Dec 16, 2017 at 17:35
Just library(tidyverse); theDataReconstituted %>% unnest() %>% group_by(WORD) %>% mutate(Number = row_number()) will work. You'll get some errors about coercing factors to character, but it's not causing any problems. — alistaire
– alistaire, Commented Dec 16, 2017 at 18:05

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2017-12-17 11:46:04Z

0

As @alistaire sais in comment, the answer is:

library(tidyverse)
theDataReconstituted %>% 
  unnest() %>% 
  group_by(WORD) %>% 
  mutate(Number = row_number())

You'll get some errors about coercing factors to character, but it's not causing any problems.

edited Dec 17, 2017 at 11:46

A5C1D2H2I1M1N2O1R2T1

194k31 gold badges417 silver badges497 bronze badges

answered Dec 17, 2017 at 9:42

Remi

1,1312 gold badges14 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Split lists into several rows using R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related