How do I web scrape this basic webpage to CSV?

Question

https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html

So so far I have:

scrape1 <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html')

My lecturer taught me to use:

scrape1_nodes<-scrape1 %>%
  html_nodes("p")
head(scrape1_nodes)

However this method doesn't seem to be working, is there an easy way to find the CSV or direct R to the data in the HTML page?

This is a complex web page, you will need to study the structure and extract out the correct "div" nodes and then parse the text correctly. — Dave2e
– Dave2e, Commented Apr 8 at 22:04

lailaps · Accepted Answer · 2025-04-08 22:22:06Z

One option could be to retrieve the text from the correct HTML-Elements. Sadly, this website is not very basic with it's layout

The days have to be treated differently, because the website structures the dates in such an atrotious way. E.g. there are cases like Tue 22 where two cruises depart on the same day which has to be handeled. Not every day has a month within the same div, but rather the month is some divs on top on the SAME level
Another difficulty is, that the cruise lines on the website are not noted as text but rather as images. So these have to be retrieved using the alt attribute

library(rvest)
library(tidyverse)

page <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html')

# get day and month
day <- page %>% 
  html_elements("div[class^='psovde-day'], div[class^='psovde-month']") %>%
  html_text2() %>% 
  gsub("[^[:alnum:]]", "", .)

day <- day[-1]
day[day == ""] <- NA


df_dates <- as.data.frame(day) %>% 
  fill(day, .direction = "down") %>%
  mutate(month = ifelse(grepl("^[A-Za-z]+$", day), day, NA)) %>%
  fill(month, .direction = "down") %>%
  filter(day != month)

cruise_line <- page %>% 
  html_elements("div.psovde-cruiseline img") %>%
  html_attr("alt") %>% 
  gsub(" logo", "", .)

ship <- page %>% 
  html_elements("div[class^='psovde-ship']") %>% 
  html_text2() %>%
  .[!grepl("\r", .)]

times <- page %>% 
  html_elements("div[class^='psovde-times']") %>% 
  html_text2() %>%
  .[!grepl("\r", .)]

passengers <- page %>% 
  html_elements("div[class^='psovde-passengers']") %>% 
  html_text2() %>%
  .[!grepl("\r", .)]

finalData <- data.frame(
  day = df_dates$day,
  month = df_dates$month,
  cruise_line = cruise_line,
  ship = ship,
  times = times,
  passengers = passengers
)

giving

     day month    cruise_line          ship         times passengers
1 Wed16 April           AIDA       AIDAsol a 1000 d 2000       2174
2 Thu17 April CFC Croisieres   Renaissance a 0700 d 1800       1358
3 Wed23 April           AIDA       AIDAsol a 1000 d 1900       2174
4 Tue29 April Phoenix Reisen         Amera a 0800 d 2000        834
5  Sat3   May           AIDA      AIDAluna a 0800 d 1800       2050
6  Tue6   May    TUI Cruises Mein Schiff 3 a 0730 d 1900       2506

Which you can write as CSV with write.csv(finalData, "cruises.csv")

Yssa Dee · Accepted Answer · 2025-05-05 22:00:55Z

-1

Ah, I see what you're trying to do here—totally been there when stuff just doesn’t come out as expected from html_nodes("p"). The thing is, that page doesn’t store the cruise schedule inside <p> tags—it actually uses an HTML table. So instead of grabbing paragraph nodes, try selecting the table itself.

Here’s a quick example using rvest. That should pull the main schedule table and let you save it directly as a CSV.

library(rvest)

url <- "https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html"
page <- read_html(url)

# Try extracting the table
table_data <- page %>%
  html_element("table") %>%
  html_table()

# View it
head(table_data)

# Save to CSV
write.csv(table_data, "invergordon_schedule.csv", row.names = FALSE)

answered May 5 at 22:00

Yssa Dee

1

1 Comment

jpsmith May 6 at 1:34

Did you use ChatGPT or some other LLM for this?

Collectives™ on Stack Overflow

How do I web scrape this basic webpage to CSV?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related