Skip to main content
6 of 15
added 94 characters in body
lailaps
  • 11.3k
  • 1
  • 6
  • 25

What I usually do in these cases is to

  1. get each row <li>
  2. and then use xpaths to navigate from each <li> element down to the element of interest and map it to column.

It's a bit tidious but you save yourself the trouble of cleaning up the merged texts.


On the page use F12 -> Elements to inspect the tables' structure using the button (1). inspect

You can right click on elements within the structure -> "copy xpath" to get the path for each html_element() call. So from each <li> element we go down the xpaths to get to the column data.

guide

library(rvest)

# get rows 'li' of table to iterate over them
rows <- read_html("https://gainblers.com/mx/quinielas/progol-revancha/", encoding = "UTF-8") |>
  html_element(xpath= '//*[@id="content_seccionb"]/div[1]/ul') |>
  html_nodes("li") 

# helper function to get the text from a nodes child found by xpath
from_xpath <- \(x, path) x |> html_element(xpath = path) |> html_text(trim = TRUE)

foo <- rows |>
 purrr::map_df(~ {
   list(
     nr =        from_xpath(.x, "div[1]/span"),
     partidos1 = from_xpath(.x, "div[1]/p/span[1]"), 
     partidos2 = from_xpath(.x, "div[1]/p/span[3]"),
     L1 =        from_xpath(.x, "div[2]/span"),
     L2 =        from_xpath(.x, "div[2]/strong"),
     E1 =        from_xpath(.x, "div[3]/span"),
     E2 =        from_xpath(.x, "div[3]/strong"),
     V1 =        from_xpath(.x, "div[4]/span"),
     V2 =        from_xpath(.x, "div[4]/strong"),
     pron1 =     from_xpath(.x, "div[5]/div[1]"),
     pron2 =     from_xpath(.x, "div[5]/div[2]")
   )
 }) |> 
 data.frame() |>
 subset(!is.na(partidos1)) # filter out header row

giving

   nr             partidos1        partidos2   L1  L2   E1  E2   V1  V2 pron1 pron2
2   1                México            Japón 2,42 39% 3,50 27% 2,72 34%     L     V
3   2                   USA    Corea del Sur 2,22 43% 3,20 30% 3,40 28%     L  <NA>
4   3             Juárez FC          Pachuca 3,87 24% 3,55 26% 1,79 51%     V  <NA>
5   4           Tigres UANL        Monterrey 1,53 59% 3,88 23% 5,31 17%     L  <NA>
6   5    Dorados De Sinaloa         Irapuato 2,96 31% 3,40 27% 2,23 42%     L     V
7   6        Tampico Madero          Tapatio 1,60 58% 3,90 24% 5,13 18%     L  <NA>
8   7 Tepatitlan de Morelos    Leones Negros 2,03 46% 3,38 27% 3,40 27%     L  <NA>
9   8               Irlanda          Hungría 2,75 36% 3,26 30% 2,90 34%     L     V
10  9                Grecia        Dinamarca 2,82 35% 3,24 30% 2,84 35%     L     V
11 10         Estoril Praia      Santa Clara 2,77 34% 3,10 31% 2,75 35%     L     V
12 11        St. Louis City        Dallas FC 2,00 49% 4,15 24% 3,55 28%     L  <NA>
13 12  Sporting Kansas City        Austin FC 2,49 39% 3,75 26% 2,67 36%     L     V
14 13      Deportivo Coruña   Sporting Gijón 2,20 44% 3,25 30% 3,70 26%     L     E
15 14                Burgos       Las Palmas 2,43 40% 3,12 31% 3,25 30%     L     V
16 15                Burgos       Las Palmas 2,43 40% 3,12 31% 3,25 30%     L     V
17 16                 Pumas           Toluca        %        %        %  <NA>  <NA>
18 17           Tlaxcala FC           Oaxaca 1,73 54% 4,00 24% 4,25 22%     L  <NA>
19 18               Atlante     Correcaminos 1,31 71% 5,05 18% 9,10 10%     L  <NA>
20 19               Turquía           España 6,00 17% 4,61 22% 1,60 62%     V  <NA>
21 20                Israel           Italia 8,70 11% 4,91 20% 1,43 69%     V  <NA>
22 21              Zaragoza       Valladolid 2,65 37% 3,25 30% 2,90 33%     L     V
23 22               Almería Racing Santander 1,90 51% 3,84 25% 4,00 24%     L  <NA>
lailaps
  • 11.3k
  • 1
  • 6
  • 25