R using getURL data to dataframe

Question

I'm downloading data from the web but then don't know how to change it to a dataframe or anything useful. Does anyone have any suggestions? Here is the code:

library(RCurl) 
myfile = getURL(http://www.stat.ufl.edu/~winner/data/lister_ul.dat,
ssl.verifyhost=FALSE, ssl.verifypeer=FALSE)

If I use this:

A = read.csv(textConnection(myfile), header = F)

then R understands this:

c("1 1 1")

as the first row and not this:

c(1, 1, 1).

This doesn't work b/c I need to use

colnames(A) = c("col1", "col2", "col3")

and can't find a workaround that doesn't involve some tedious work using

unlist(strsplit(A))

Ughh!!

Any suggestions would be appreciated. Or maybe I'll write my own tedious function, if necessary.

gwynn

udden2903 · Accepted Answer · 2017-03-13 15:09:49Z

2

Does this help?

df <- read.table('http://www.stat.ufl.edu/~winner/data/lister_ul.dat')

answered Mar 13, 2017 at 15:09

udden2903

7837 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Carl Boneri Over a year ago

Also this ^ haha

nzgwynn Over a year ago

If all I wanted to do was read that one webpage I'm sure it would've. Thanks

r2evans · Accepted Answer · 2017-03-13 15:12:42Z

1

You are close. Since I don't have RCurl installed but I do have httr (which uses curl), I'll start with that. It's a moot problem, though, since I get to the same table-looking content as you.

Also, @udden2903's answer is more straight-forward, I'm making an assumption that this is a simplified problem, and that you may have need to continue using an alternative fetching method that read.table(URL) does not allow. (To continue using httr and support some other things such as authentication, read its documentation.)

library(httr)
myfile = GET("http://www.stat.ufl.edu/~winner/data/lister_ul.dat")
str(content(myfile))
# No encoding supplied: defaulting to UTF-8.
#  chr "1 1  1\n1 0 11\n0 1  6\n0 0  6\n"

So, content(myfile) is now what your myfile is. The first trick is that your data is not comma-delimited ("csv"), so using read.table is necessary. Second, you nede to specifiy that the first line is not headers.

x <- read.table(textConnection(content(myfile, encoding = "UTF-8")), header = FALSE)
x
#   V1 V2 V3
# 1  1  1  1
# 2  1  0 11
# 3  0  1  6
# 4  0  0  6

Now just assign your headers.

colnames(x) <- c("col1", "col2", "col3")
x
#   col1 col2 col3
# 1    1    1    1
# 2    1    0   11
# 3    0    1    6
# 4    0    0    6

answered Mar 13, 2017 at 15:12

r2evans

167k8 gold badges92 silver badges176 bronze badges

6 Comments

nzgwynn Over a year ago

The same methodology doesn't work for this url: ""stat.ufl.edu/~winner/data/warpeace.dat." I can download it with getURL but then not sure what to do from there. I'm trying to download all of the datasets from stat.ufl.edu/~winner/datasets.html and turn it into a GUI. I parsed the html code but can't quite read the data. Any suggestions?

r2evans Over a year ago

That link fails for two reasons: (1) your link includes an extra dot resulting in a HTTP-404, easily removed; and (2) it is fixed-width and not space-delimited. You may prefer to use the readr package, since it handles this type of thing more smoothly. Instead you would use readr::read_table(content(myfile, encoding="UTF-8"), col_names=FALSE). Otherwise, consider fixed-width table parsing with read.fwf, a bit more more literal and fragile but works fine.

nzgwynn Over a year ago

The extra "." is the result of poor editing. This worked until "stat.ufl.edu/~winner/data/crop_circle.csv" probably because it's a csv file. I have dat, csv, lsx, and xls. I'll need to do them all separately. For the xls I'm using read_excel. What should I do with the csv and lsx? Thanks for your patience!

r2evans Over a year ago

Of course it won't work with crop_circle.csv: read.csv reads comma-delimited files, read.table for other formats of tables stored in a file. If by "lsx" you actually mean "xlsx" (since I found no files ending in .lsx), then your choice of readxl::read_excel should work equally for both. For each link you harvest, you will need to determine by the file extension which parsing function you'll need to use.

nzgwynn Over a year ago

I'm a bit tired, so I might misunderstand, but is this the function I'm to use for a csv file: readr::read_csv(content(myfile, encoding="UTF-8"), col_names=FALSE)? Thanks for your patience and help.

|

Carl Boneri · Accepted Answer · 2017-03-13 15:11:51Z

0

Using only base package functions:

as.data.frame(
    do.call("rbind", strsplit(
        readLines("http://www.stat.ufl.edu/~winner/data/lister_ul.dat"),
        "\\s+"))
)

  V1 V2 V3
1  1  1  1
2  1  0 11
3  0  1  6
4  0  0  6

What we did was read the raw lines from the webpage, then split each line by the spaces between the characters returned, then created a matrix by calling rbind on each row... which we then translated into a data frame.

answered Mar 13, 2017 at 15:11

Carl Boneri

2,7321 gold badge16 silver badges16 bronze badges

Collectives™ on Stack Overflow

R using getURL data to dataframe

3 Answers 3

2 Comments

6 Comments

Using only base package functions:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

6 Comments

Using only base package functions:

Comments

Your Answer

Sign up or log in

Post as a guest

Related