0

I'm trying to parse this webpage into a dataframe but keep getting stuck using the XML package being told it's not XML.

I would like to take the below text and convert into a table/data.frame - what is the easiest way to do this after i've taken the URL text and htmlParsed it?

doc = getURL("http://m.racingpost.com/card/blocks.sd?race_id=first&r_date=2015-03-28&tab=card&view=meetings&blocks=cards-list&_=1427439140572") doc = htmlParse(doc, asText=T)

1 Answer 1

3

The URL is returning JSON. You can parse it using a number of R packages RJSONIO, rjson and jsonlite:

library(jsonlite)
appURL <- "http://m.racingpost.com/card/blocks.sd?race_id=first&r_date=2015-03-28&tab=card&view=meetings&blocks=cards-list&_=1427439140572"
appDATA <- fromJSON(appURL)
appITEMS <- appDATA[["cards-list"]][["items"]]
> appITEMS$c1083
$abandonedCount
[1] 0

$crsName
[1] "Chelmsford (AW)"

$crsAbbr
[1] "Cfd"

$isForeign
[1] ""

$races
id                                                           title distance cls crsId time       date
1 620151        Buy Online At chelmsfordcityracecourse.com Maiden Stakes       1m   4  1083 2:20 2015-03-28
2 620152 Dubai World Cup toteplacepot Today Maiden Stakes (Plus 10 Race)       5f   4  1083 2:55 2015-03-28
3 620153                            &pound;1 Million totescoop6 Handicap       5f   2  1083 3:30 2015-03-28
4 620154                                toteexacta Pick The 1,2 Handicap       6f   4  1083 4:05 2015-03-28
5 620155               totetrifecta Pick The 1,2,3 Handicap (Bobis Race)       1m   3  1083 4:40 2015-03-28
6 620156                                               totepool Handicap     1m2f   2  1083 5:15 2015-03-28
7 620157                                  Madness Live 3rd June Handicap     1m2f   4  1083 5:50 2015-03-28
timestamp raceGroup hCount abandoned videoId    going offers
1 1427552400                8             57049 Standard   NULL
2 1427554500                5             57050 Standard   NULL
3 1427556600  Handicap     12             57051 Standard   NULL
4 1427558700  Handicap      7             57052 Standard   NULL
5 1427560800  Handicap      8             57053 Standard   NULL
6 1427562900  Handicap      7             57054 Standard   NULL
7 1427565000  Handicap      6             57055 Standard   NULL

The data is not returned in a tabular format but you can work with the individual "items" to fit your needs. The jsonlite package helpfully returns appropriate tabular structures also.

Sign up to request clarification or add additional context in comments.

4 Comments

That's looks great but when i do it i get this error:
Error in feed_push_parser(buf) : lexical error: invalid char in json text. <html><head><title>Request Reje (right here) ------^
When i installed the curl package it told me it was for an earlier version of r - is this the difference?
Managed to fix that by using getURL to bring in the json text first - seem to solve it - thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.