1

Through a POST request using the httr package, I get back XML in the following format:-

<ReportDelivery responsecode="0" responsetext="descriptive text">
  <Terminal isn="DCC000000001" imo="111111111" name="MV Vessel A">
    <Report>
      <DateTime>01/10/2014 15:30:45</DateTime> 
      <Lat>99.9999999</Lat> 
      <Lon>999.9999999</Lon> 
      <Cog>999</Cog> 
      <Sog>999</Sog> 
      <Voltage>99</Voltage> 
      <Status>Description of status</Status> 
    </Report>
    <Report>
      ..
    </Report>
  </Terminal>
  <Terminal isn="DCC000000002" imo="222222222" name="MV Vessel B">
   ..
  </Terminal>
</ReportDelivery>

I am able to get the "Report" part in a data frame using two functions available here:-

#Using functions from https://rud.is/rpubs/xml2power/
xtrct <- function(doc, target) { xml_find_all(doc, target) %>% xml_text() %>% trimws() }

xtrct_df <- function(doc, top) {
  xml_find_first(doc, sprintf(".//%s", top)) %>%
    xml_children() %>%
    xml_name() %>%
    map(~{
      xtrct(doc, sprintf(".//%s/%s", top, .x)) %>%
        list() %>%
        set_names(tolower(.x))
    }) %>%
    flatten_df() %>%
    readr::type_convert()
}

x <- xtrct_df(doc, "Report")

Within each Terminal node, there are multiple reports pertaining to a particular ship whose attributes are given in the Terminal node.

Currently, the columns in x are:

names(x)
[1] "datetime" "lat"      "lon"      "cog"      "sog"      "voltage"  "status" 

How can I add the name of the ship as a column to this dataframe? I can extract the name attribute using :-

xattrs <- xpathSApply(z, "//*/Terminal/@name")

BBut have no clue on how to include this as a variable in the dataframe. Would appreciate some help please.

1 Answer 1

2

Taking a somewhat different route from @hrbrmstr, we can map_df on each element, while also finding the parent and extracting the appropriate attr:

library(xml2)
library(purrr)

col_names <- read_xml(x) %>%
  xml_find_first('.//Report') %>% 
  xml_children() %>% 
  xml_name() 

read_xml(x) %>% 
  xml_find_all(".//Report") %>% 
  map_df(~{
    parent_name <- xml_parent(.x) %>% 
      xml_attr('name') %>% 
      as.character()

    xml_children(.x) %>% 
      as_list() %>% 
      data.frame(stringsAsFactors = FALSE) %>% 
      set_names(col_names) %>% 
      cbind(VesselName=parent_name)
  })

#>              DateTime        Lat         Lon Cog Sog Voltage
#> 1 01/10/2014 15:30:45 99.9999999 999.9999999 999 999      99
#> 2 01/10/2014 15:30:45 99.9999999 999.9999999 999 999      99
#> 3 01/10/2014 15:30:45 99.9999999 999.9999999 999 999      99
#>                  Status  VesselName
#> 1 Description of status MV Vessel A
#> 2 Description of status MV Vessel A
#> 3 Description of status MV Vessel B

Created on 2018-05-07 by the reprex package (v0.2.0).

Data:

x <- '<ReportDelivery responsecode="0" responsetext="descriptive text">
  <Terminal isn="DCC000000001" imo="111111111" name="MV Vessel A">
    <Report>
      <DateTime>01/10/2014 15:30:45</DateTime> 
      <Lat>99.9999999</Lat> 
      <Lon>999.9999999</Lon> 
      <Cog>999</Cog> 
      <Sog>999</Sog> 
      <Voltage>99</Voltage> 
      <Status>Description of status</Status> 
    </Report>
    <Report>
      <DateTime>01/10/2014 15:30:45</DateTime> 
      <Lat>99.9999999</Lat> 
      <Lon>999.9999999</Lon> 
      <Cog>999</Cog> 
      <Sog>999</Sog> 
      <Voltage>99</Voltage> 
      <Status>Description of status</Status> 
    </Report>
  </Terminal>
  <Terminal isn="DCC000000002" imo="222222222" name="MV Vessel B">
    <Report>
      <DateTime>01/10/2014 15:30:45</DateTime> 
      <Lat>99.9999999</Lat> 
      <Lon>999.9999999</Lon> 
      <Cog>999</Cog> 
      <Sog>999</Sog> 
      <Voltage>99</Voltage> 
      <Status>Description of status</Status> 
    </Report>
  </Terminal>
</ReportDelivery>'
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @GGamba, this works perfectly. However, was wondering if there is any way in which the time taken to execute can be sped up? Right now, the response from my GET is 10 elements of size 29.7 MB. To execute your code it takes 8.23 minutes presently on Macbook Pro.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.