How to parse specific node from multiple xml nodes output

Question

I want to get GPS address from specific site with rvest. When i run html_nodes() on url i get xml_nodeset(35). I want to get to specific node with GPS [number 24 in list]

site: https://www.doz.pl/apteki/a127238-DOZ_Apteka_Dbam_o_Zdrowie

When i run:

Url %>% 
  html_node("span") %>%
  html_text()

output Toggle navigation

I can only get to first node (Toggle navigation), how to get to 24 node?

Copy selector output

"body > div.page.has-menu-bottom.relative.sticky-nav > section > div > section >
div > div.col-xs-12.col-sm-12.col-lg-9.col-md-9 > div.panel.margin__top-20 >
ul:nth-child(8) > li:nth-child(1) > span:nth-child(2)"

Copy xpath output

"/html/body/div[1]/section/div/section/div/div[2]/div[1]/ul[3]/li[1]/span[2]"

Code

library(rvest)
Url <- read_html("https://www.doz.pl/apteki/a127238-DOZ_Apteka_Dbam_o_Zdrowie")

Url %>% 
  html_nodes("span") 


ListOfNodes <- Url %>% 
  html_nodes("span") 

ListOfNodes[1:35]

   [1] <span class="sr-only">Toggle navigation</span>
  [2] <span class="icon-bar"></span>
  [3] <span class="icon-bar"></span>
  [4] <span class="icon-bar"></span>
  [5] <span class="badge badge-info"></span>
  [6] <span class="basket__price">\r\n                                                            0 ...
[7] <span class="icon"> </span>
  [8] <span class="allCategoriesLabel">Wszystkie kategorie</span>
  [9] <span class="list__definition">Adres apteki</span>
  [10] <span>Wolności 40, 84-300 Lębork</span>
  [11] <span class="list__definition">Dyżur pn-pt</span>
  [12] <span> 07:30-21:30</span>
  [13] <span class="list__definition">Dyżur sobota:</span>
  [14] <span>08:00-21:00</span>
  [15] <span class="list__definition">Dyżur niedziela</span>
  [16] <span>08:00-20:00</span>
  [17] <span class="list__definition">Telefon:</span>
  [18] <span>059 8622766</span>
  [19] <span class="list__definition">Email:</span>
  [20] <span><a href="mailto:%61%70%74%31%32%37%32%33%38@%64%62%61%6d0%6c..."
 [21] <span class="list__definition">Komunikator:</span>
 [22] <span>-</span>
 [23] <span class="list__definition">GPS:</span>
 [24] <span>17:44:47.09|54:32:25.63</span>
 [25] <span class="list__definition">Długość:</span>
 [26] <span>17.7464132000</span>
 [27] <span class="list__definition">Szerokość:</span>
 [28] <span>54.5404538000</span>
 [29] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/pa ...
[30] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/pr ...
[31] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/de ...
[32] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/ex ...
[33] <span class="cookie__message">\r\n                    Ważne: Użytkowanie Witryny oznacza zgod ...
[34] <span>Infolinia:</span>
[35] <span>Infolinia:</span>

GGamba · Accepted Answer · 2017-02-08 12:16:26Z

2

What you are doing in the Code section is right, you just need to extract the 24th element from the list:

url <- "https://www.doz.pl/apteki/a127238-DOZ_Apteka_Dbam_o_Zdrowie"
read_html(url) %>% 
    html_nodes("span") %>% 
    '[['(24) %>% 
    html_text()

[1] "17:44:47.09|54:32:25.63"

To identify the correct node, supposing it is always after the text 'GPS:' you can use Position():

pos <- Position(x = NodeList, f = function(x){ html_text(x)=='GPS:'}) + 1

Piping it looks a bit ugly, but works:

read_html(url) %>% 
    html_nodes("span")%>% 
    '[['(Position(x = ., f = function(x){ html_text(x)=='GPS:'}) + 1) %>% 
    html_text()

edited Feb 8, 2017 at 12:16

answered Feb 8, 2017 at 12:00

GGamba

13.7k3 gold badges41 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

M. Siwik Over a year ago

Have you idea how to guess it's 24 node, without checking in manually?

Collectives™ on Stack Overflow

How to parse specific node from multiple xml nodes output

Copy selector output

Copy xpath output

Code

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Copy selector output

Copy xpath output

Code

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related