1

i am using the following code to scrape some data from amazon

$nodelist = $xpath_cat->query('//li[@id="SalesRank"]/text()');
foreach ($nodelist as $node) {
$nodearr[] = trim($node->textContent);
}
var_dump($nodearr);

and dumping the result, the output is

array
 0 => string '' (length=0)
  1 => string '#14,000 Paid in Kindle Store (' (length=30)
  2 => string ')' (length=1)
  3 => string '' (length=0)
  4 => string '#21,322 Paid in Kindle Store (' (length=30)
  5 => string ')' (length=1)
  6 => string '' (length=0)
  7 => string '#20,957 Paid in Kindle Store (' (length=30)
  8 => string ')' (length=1)

what is want is on # part which is element 2 in array like

#"#20,957 Paid in Kindle Store"

how can modify the code to get my output? i was thinking it to use unset() but i am confused in implementing it. also, there is "(" which also needs to be deleted from the string

Guide me please..how can i modify my code?

3 Answers 3

1

To select only the wanted subset of the currently selected text nodes, use:

//li[@id="SalesRank"]/text()[starts-with(., '#')]

You can select each individual such node using its 1-based index.

For example:

(//li[@id="SalesRank"]/text()[starts-with(., '#')])[3]

selects this text node:

#20,957 Paid in Kindle Store (

To get the text without the trailing "(" character, use the translate() (or substring()) function:

   translate((//li[@id="SalesRank"]/text()[starts-with(., '#')])[3], 
             '(', 
             '')

when evaluated produces:

#20,957 Paid in Kindle Store 
Sign up to request clarification or add additional context in comments.

Comments

1

This seems to be answered pretty thoroughly here.

It looks like the accepted answer uses:

substring-before(normalize-space(/html/body//ul/li[@id="SalesRank"]/b[1]/following-sibling::text()[1])," ")

And also shows some other nice options.

2 Comments

sorry for mistake, i had opened that question a bit late and got the updated answer after posting this question..well...what i think ..should i use the update xpath?
I have no idea, but I'm as curious as you :)
0

You could probably just tweak your xpath query a little, but you could use also array_filter() to filter the array. For example like this:

array_filter($data, function($e) {return $e[0] == "#";});

With an input of, for example

$data = array('#14,000 Paid in Kindle Store (', '', '(');

the above array_filter gives

array(1) {
    [0]=>
    string(30) "#14,000 Paid in Kindle Store ("
}

You could then filter/transform the single values, for example using array_map:

array_map(function($e) {return rtrim($e, ' (');}, $data);

which would leave you with:

array(1) {
    [0]=>
    string(28) "#14,000 Paid in Kindle Store"
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.