0

I'm studying selenium and I want to extract the texts and links from Sympla's events, but when I click on the "more events" button, I can't extract the next events, it is always extracting the same initial events from the page.

Complete class for easy reproduction.

public static void main(String[] args) throws InterruptedException {

        WebDriverManager.firefoxdriver().setup();
        WebDriver driver = new FirefoxDriver();
        driver.manage().window().maximize();
        driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");

        driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);

        // If have captcha, close the page and exit.
        boolean captcha = driver.getPageSource().contains("Não sou um robô");

        if (captcha == true) {
            System.out.println("O Captcha apareceu, acabou a brincadeira!");

            driver.close();
            driver.quit();
        }

        // load more button
        WebElement CarregarMais = driver.findElement(By
                .xpath("//button[@id='more-events']"));

        // Number of events counter
        List<WebElement> eventos = (List<WebElement>) driver.findElements(By
                .cssSelector("div.event-name.event-card"));
        System.out.println("Number of links: " + eventos.size());

        // Number of links counter
        List<WebElement> eventos_link = (List<WebElement>) driver
                .findElements(By.cssSelector("a.sympla-card.w-inline-block"));

        // iterating over the button more events
        for (int j = 0; j < eventos.size(); j++) {

            CarregarMais.click();

            @SuppressWarnings("deprecation")
            WebDriverWait wait = new WebDriverWait(driver, 10);
            WebElement element = wait.until(ExpectedConditions
                    .elementToBeClickable(By
                            .xpath("//button[@id='more-events']")));

            // Iterating over event links
            for (int i = 0; i < eventos_link.size(); i++) {

                System.out.println(i + " " + eventos.get(i).getText() + " - "
                        + eventos_link.get(i).getAttribute("href"));
                Thread.sleep(500);

            }

        }

    }
1
  • Hi, if my answer is the solution to your problem please mark it accordingly. If not, please specify what issue still exists. Thanks Commented Apr 30, 2020 at 9:18

1 Answer 1

1

It's because you don't read the links again. With every click on the button a new page is created, so you need to read them again.

Furthermore you would need to store the last fetched link.

So after waiting for the button to be clickable again you need to reread eventos and eventos_link. And maybe you use a global variable like lastFetchedLinkIndex.

This would be my approach (adjusted your code):

WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");

driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);

// If have captcha, close the page and exit.
boolean captcha = driver.getPageSource().contains("Não sou um robô");

if (captcha == true) {
    System.out.println("O Captcha apareceu, acabou a brincadeira!");

    driver.close();
    driver.quit();
}

// load more button
WebElement CarregarMais = driver.findElement(By
        .xpath("//button[@id='more-events']"));

// Number of events counter
List<WebElement> eventos = (List<WebElement>) driver.findElements(By
        .cssSelector("div.event-name.event-card"));
System.out.println("Number of links: " + eventos.size());

// Number of links counter
List<WebElement> eventos_link = (List<WebElement>) driver
        .findElements(By.cssSelector("a.sympla-card.w-inline-block"));
int lastEventScraped = 0;
// iterating over the button more events
for (int j = 0; j < eventos.size(); j++) {

    CarregarMais.click();

    @SuppressWarnings("deprecation")
    WebDriverWait wait = new WebDriverWait(driver, 10);
    WebElement element = wait.until(ExpectedConditions
            .elementToBeClickable(By
                    .xpath("//button[@id='more-events']")));

    eventos = (List<WebElement>) driver.findElements(By
            .cssSelector("div.event-name.event-card"));
    eventos_link = (List<WebElement>) driver
            .findElements(By.cssSelector("a.sympla-card.w-inline-block"));
    // Iterating over event links
    for (int i = lastEventScraped; i < eventos_link.size(); i++, lastEventScraped++) {

        System.out.println(i + " " + eventos.get(i).getText() + " - "
                + eventos_link.get(i).getAttribute("href"));
        Thread.sleep(500);
    }

}
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect, Thanks Very Much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.