0

I want to get GPS address from specific site with rvest. When i run html_nodes() on url i get xml_nodeset(35). I want to get to specific node with GPS [number 24 in list]

site: https://www.doz.pl/apteki/a127238-DOZ_Apteka_Dbam_o_Zdrowie

When i run:

Url %>% 
  html_node("span") %>%
  html_text()

output Toggle navigation

I can only get to first node (Toggle navigation), how to get to 24 node?

Copy selector output

"body > div.page.has-menu-bottom.relative.sticky-nav > section > div > section >
div > div.col-xs-12.col-sm-12.col-lg-9.col-md-9 > div.panel.margin__top-20 >
ul:nth-child(8) > li:nth-child(1) > span:nth-child(2)"

Copy xpath output

"/html/body/div[1]/section/div/section/div/div[2]/div[1]/ul[3]/li[1]/span[2]"

Code

library(rvest)
Url <- read_html("https://www.doz.pl/apteki/a127238-DOZ_Apteka_Dbam_o_Zdrowie")

Url %>% 
  html_nodes("span") 


ListOfNodes <- Url %>% 
  html_nodes("span") 

ListOfNodes[1:35]

   [1] <span class="sr-only">Toggle navigation</span>
  [2] <span class="icon-bar"></span>
  [3] <span class="icon-bar"></span>
  [4] <span class="icon-bar"></span>
  [5] <span class="badge badge-info"></span>
  [6] <span class="basket__price">\r\n                                                            0 ...
[7] <span class="icon"> </span>
  [8] <span class="allCategoriesLabel">Wszystkie kategorie</span>
  [9] <span class="list__definition">Adres apteki</span>
  [10] <span>Wolności 40, 84-300 Lębork</span>
  [11] <span class="list__definition">Dyżur pn-pt</span>
  [12] <span> 07:30-21:30</span>
  [13] <span class="list__definition">Dyżur sobota:</span>
  [14] <span>08:00-21:00</span>
  [15] <span class="list__definition">Dyżur niedziela</span>
  [16] <span>08:00-20:00</span>
  [17] <span class="list__definition">Telefon:</span>
  [18] <span>059 8622766</span>
  [19] <span class="list__definition">Email:</span>
  [20] <span><a href="mailto:%61%70%74%31%32%37%32%33%38@%64%62%61%6d0%6c..."
 [21] <span class="list__definition">Komunikator:</span>
 [22] <span>-</span>
 [23] <span class="list__definition">GPS:</span>
 [24] <span>17:44:47.09|54:32:25.63</span>
 [25] <span class="list__definition">Długość:</span>
 [26] <span>17.7464132000</span>
 [27] <span class="list__definition">Szerokość:</span>
 [28] <span>54.5404538000</span>
 [29] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/pa ...
[30] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/pr ...
[31] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/de ...
[32] <span class="benefit__icon">\r\n                        <img src="/assets/doz/images/icons/ex ...
[33] <span class="cookie__message">\r\n                    Ważne: Użytkowanie Witryny oznacza zgod ...
[34] <span>Infolinia:</span>
[35] <span>Infolinia:</span>

1 Answer 1

2

What you are doing in the Code section is right, you just need to extract the 24th element from the list:

url <- "https://www.doz.pl/apteki/a127238-DOZ_Apteka_Dbam_o_Zdrowie"
read_html(url) %>% 
    html_nodes("span") %>% 
    '[['(24) %>% 
    html_text()

[1] "17:44:47.09|54:32:25.63"

To identify the correct node, supposing it is always after the text 'GPS:' you can use Position():

pos <- Position(x = NodeList, f = function(x){ html_text(x)=='GPS:'}) + 1

Piping it looks a bit ugly, but works:

read_html(url) %>% 
    html_nodes("span")%>% 
    '[['(Position(x = ., f = function(x){ html_text(x)=='GPS:'}) + 1) %>% 
    html_text()
Sign up to request clarification or add additional context in comments.

1 Comment

Have you idea how to guess it's 24 node, without checking in manually?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.