2

in this address i am trying to scrape a tage (that is Larg price which is bold red one)

i use LIBXML 2.2

when i try to extract the tag through this XPATH

//*[@class='priceLarge']

it works!

but to make queries easier i would like to use FireBug on Firefox.

Using FireBug it gives me this XPath

/html/body/div[2]/form/table[3]/tbody/tr/td/div/table/tbody/tr[2]/td[2]/span/b

using this Xpath it does not work, seems this one does not give a complete query. how can i modify this XPath to scrape the item ?

1 Answer 1

2

Firefox and other browsers generate tbody tags in HTML.

In fact, the tbody is probably not there, so you can remove it in your XPath. (/html/body/div[2]/form/table[3]/tr/td/div/table/tr[2]/td[2]/span/b) You can test this by just saving the HTML from your application and viewing it in a text editor.

Since it seems the intent is to pull information from a web page however, your application will probably be more resistant to changes in the web page if you use XPath less dependent on the tree structure (i.e. //b[@class='priceLarge']).

EDIT: It seems that in addition to the tbody problem, Firefox is rendering the div (ID: divsinglecolumnminwidth) element as containing the form element (ID: handleBuy).

Looking at the html with an XML editor shows that the form element is a sibling of that div element, so the expression should start with /html/body/form/table[3].

One tool, among many others, to test your XPath expressions is HAP Testbed.

Sign up to request clarification or add additional context in comments.

3 Comments

i tried it without tbody, but still it does not work ! any idea?
----/html/body/div[2]/form/table[3]/tbody/tr/td/div/table/tbody/tr[2]/td[2]/span[1]/b --- i tried this in an evaluator and in there it gave me the right value. i just changed the last spin to spin[1]. but still when i use it with LibXml 2 it does not work! i dont know why !
Don't copy the HTML into the evaluator from Firefox view source if you are not copying it into your program. If you are downloading it in your program, input the URL into the evaluator (if possible) or save the HTML downloaded by your program and copy that. We need the HTML downloaded by the application, not processed by a browser. Aside from that, slowly build the XPath in LibXml2 until you do not hit a value and that will narrow down the differences.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.