-2

Hi i am trying to extract data from another site which i am able to do but problem is that i want to extract my data in my desired format which i am not able to achieve so how can i achieve my goal

here is my code which i did

import com.gargoylesoftware.htmlunit.BrowserVersion;
import java.util.StringTokenizer;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;
import org.openqa.selenium.support.ui.Select;
import java.sql.*;

public class Getdata2 {

    Statement st=null;
    Connection cn=null;
    public static void main(String args[]) throws InterruptedException, ClassNotFoundException, SQLException {

        WebDriver driver = new HtmlUnitDriver(BrowserVersion.getDefault());
        String sDate = "27/03/2014";

        String url="http://www.upmandiparishad.in/commodityWiseAll.aspx";
        driver.get(url);
        Thread.sleep(5000);

        new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo");
        driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate);

        Thread.sleep(3000);
        driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click();
        Thread.sleep(5000);


        WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1"));
        String htmlTableText = findElement.getText();
        // do whatever you want now, This is raw table values.
        htmlTableText=htmlTableText.replace("S.No.DistrictMarketPrice","");
        System.out.println(htmlTableText);


        driver.close();
        driver.quit();

    }
}

i want to extract my data like this

1 Agra Achhnera NIL
2 Agra Agra NIL
3 Agra Fatehabad NIL
4 Agra FatehpurSikri NIL
5 Agra Jagner NIL
6 Agra Jarar NIL
7 Agra Khairagarh NIL
8 Agra Shamshabad NIL
9 Aligarh Atrauli NIL
10 Aligarh Chharra NIL
11 Aligarh Aligarh 1300.00
12 Aligarh Khair 1300.00
13 Allahabad Allahabad NIL
14 Allahabad Jasra NIL
15 Allahabad Leriyari NIL
16 Allahabad Sirsa NIL
17 AmbedkarNagar Akbarpur NIL
18 Ambedkar Nagar TandaAkbarpur NIL

How can i achieve my desired output

Thanks in advance

5
  • possible duplicate of How to do web scraping using htmlunitsriver? Commented Apr 4, 2014 at 7:14
  • 1
    How many accounts do you have? Why is that? Commented Apr 4, 2014 at 7:18
  • i dont know why my that account blocked for 7 days so i had to make sorry Commented Apr 4, 2014 at 7:19
  • @Nadun can u under stand my prob solve it dear Commented Apr 4, 2014 at 7:20
  • you can check this thread (maybe the second answer): Parsing HTML table data with xpath and selenium in java Commented Apr 4, 2014 at 7:57

1 Answer 1

1

Note: You do not need regex. Selenium itself provides good tools to extract data from tables.

Let's analyze this. Looking at the source from that website ... here is the way its arranged.

<table id="ctl00_ContentPlaceHolder1_GridView1">
    <tbody>
        <tr>
            <td></td>
            <td></td>
            <td></td>
            <td></td>
        </tr>
        ... more <trs>
</table>
  • First you get the "table rows".
  • This is done by using findElement and findElements.

(Below code is an example, modify according to your code)

List<WebElement> tableRows = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1")).findElements(By.xpath(".//tbody/tr"));
  • Now loop through each of the List<WebElement> elements, which you got above.

You do this using

for (WebElement tableRow : tableRows) {
...
}
  • Next, each table row has 4 entries (i.e 4 table cells).
  • Again use findElements as shown above.
  • Store this in a List<WebElement> (again as shown above)

Code:

tableRow.findElements(By.xpath(".//td")
  • Now, loop through each <td> WebElement.
  • Get the text within each element by calling the .getText() method on each WebElement.
  • Format the text output according to your needs.
Sign up to request clarification or add additional context in comments.

3 Comments

dear if can do this for we it will be very help full for me
I believe I have already replied to your answer. In StackOverflow it is expected that you do some work on your own as well.
i am doing same but data is not extracting

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.