0

https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html

So so far I have:

scrape1 <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html')

My lecturer taught me to use:

scrape1_nodes<-scrape1 %>%
  html_nodes("p")
head(scrape1_nodes)

However this method doesn't seem to be working, is there an easy way to find the CSV or direct R to the data in the HTML page?

1
  • 2
    This is a complex web page, you will need to study the structure and extract out the correct "div" nodes and then parse the text correctly. Commented Apr 8 at 22:04

2 Answers 2

2

One option could be to retrieve the text from the correct HTML-Elements. Sadly, this website is not very basic with it's layout

layout

  • The days have to be treated differently, because the website structures the dates in such an atrotious way. E.g. there are cases like Tue 22 where two cruises depart on the same day which has to be handeled. Not every day has a month within the same div, but rather the month is some divs on top on the SAME level
  • Another difficulty is, that the cruise lines on the website are not noted as text but rather as images. So these have to be retrieved using the alt attribute
library(rvest)
library(tidyverse)

page <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html')

# get day and month
day <- page %>% 
  html_elements("div[class^='psovde-day'], div[class^='psovde-month']") %>%
  html_text2() %>% 
  gsub("[^[:alnum:]]", "", .)

day <- day[-1]
day[day == ""] <- NA


df_dates <- as.data.frame(day) %>% 
  fill(day, .direction = "down") %>%
  mutate(month = ifelse(grepl("^[A-Za-z]+$", day), day, NA)) %>%
  fill(month, .direction = "down") %>%
  filter(day != month)

cruise_line <- page %>% 
  html_elements("div.psovde-cruiseline img") %>%
  html_attr("alt") %>% 
  gsub(" logo", "", .)

ship <- page %>% 
  html_elements("div[class^='psovde-ship']") %>% 
  html_text2() %>%
  .[!grepl("\r", .)]

times <- page %>% 
  html_elements("div[class^='psovde-times']") %>% 
  html_text2() %>%
  .[!grepl("\r", .)]

passengers <- page %>% 
  html_elements("div[class^='psovde-passengers']") %>% 
  html_text2() %>%
  .[!grepl("\r", .)]

finalData <- data.frame(
  day = df_dates$day,
  month = df_dates$month,
  cruise_line = cruise_line,
  ship = ship,
  times = times,
  passengers = passengers
)

giving

     day month    cruise_line          ship         times passengers
1 Wed16 April           AIDA       AIDAsol a 1000 d 2000       2174
2 Thu17 April CFC Croisieres   Renaissance a 0700 d 1800       1358
3 Wed23 April           AIDA       AIDAsol a 1000 d 1900       2174
4 Tue29 April Phoenix Reisen         Amera a 0800 d 2000        834
5  Sat3   May           AIDA      AIDAluna a 0800 d 1800       2050
6  Tue6   May    TUI Cruises Mein Schiff 3 a 0730 d 1900       2506

Which you can write as CSV with write.csv(finalData, "cruises.csv")

Sign up to request clarification or add additional context in comments.

Comments

-1

Ah, I see what you're trying to do here—totally been there when stuff just doesn’t come out as expected from html_nodes("p"). The thing is, that page doesn’t store the cruise schedule inside <p> tags—it actually uses an HTML table. So instead of grabbing paragraph nodes, try selecting the table itself.

Here’s a quick example using rvest. That should pull the main schedule table and let you save it directly as a CSV.

library(rvest)

url <- "https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html"
page <- read_html(url)

# Try extracting the table
table_data <- page %>%
  html_element("table") %>%
  html_table()

# View it
head(table_data)

# Save to CSV
write.csv(table_data, "invergordon_schedule.csv", row.names = FALSE)

1 Comment

Did you use ChatGPT or some other LLM for this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.