0

Following a web scrape with RCurl, I've used XML's readHTMLTable and now have a list of 100 dataframes with 40 observations of two variables. I would like to convert this to a single dataframe of 100 rows and 40 columns. The first column in each of the dataframes contains what I would like to become column names in a single dataframe. This is as close as I can get to a MWE (each of the dataframes in my actual list are named NULL):

description <- c("name", "location", "age")
value <- c("mike", "florida", "25")
df1 <- data.frame(description, value)
description <- c("name", "location", "tenure")
value <- c("jim", "new york", "5")
df2 <- data.frame(description, value)
list <- list(df1, df2)

# list output
[[1]]
  description   value
1        name    mike
2    location florida
3         age      25

[[2]]
  description    value
1        name      jim
2    location new york
3      tenure        5

Here is the general output I'm hoping to achieve:

library(reshape2)
listm <- melt(list)
dcast(listm, L1 ~ description)
# dcast output
  L1  age location name tenure
1  1   25  florida mike   <NA>
2  2 <NA> new york  jim      5

My issue, as mentioned above and for which I don't know how to represent via MWE, is the fact that each dataframe is named NULL, and there is accordingly no unique identifier by which to cast the data.

How can I deal with this issue in reshape2 and/or plyr?

1 Answer 1

2

You can use rep on the rows of each data.frame in your list to get the L1 column. Then it's straightforward to cast:

# ll is your list of data.frames
ll.df <- cbind(L1 = rep(seq_along(ll), sapply(ll, nrow)), do.call(rbind, ll))

require(reshape2)
dcast(ll.df, L1 ~ description)
  L1  age location name tenure
1  1   25  florida mike   <NA>
2  2 <NA> new york  jim      5
Sign up to request clarification or add additional context in comments.

2 Comments

I'm getting the following error: Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match. Any thoughts? Thanks.
sapply(ll, ncol) helped me identify a malformed dataframe that had 3 columns. I removed it, and your solution worked. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.