I'm scraping Google Playstore. I've an HTML text(user's comments) as follow:-
<div class="quoted-review">
<div class="review-text"> <span class="review-title">Awesome :)</span> Trying to learn some basic Lithuanian and pictures are very helpful. I'd love to learn more from who created this app.. &lt;3
<div class="paragraph-end details-light"></div>
</div>
</div>
I want to extract the complete text inside class quoted-review using XPath, ie Awesome :). Trying to learn some basic Lithuanian and pictures are very helpful. I'd love to learn more from who created this app.. <3.
Following are my xPath
1) //div[@class='quoted-review review-text']/span[@class='review-title']/text()|//div[@class='quoted-review review-text']/text()
yields a list
[
'Awesome :)' ,
'Trying to learn some basic Lithuanian and pictures are very helpful. I'd love to learn more from who created this app..'
]
I want both of them as one item. PS: Please do not advice me to concatenate index 0 and 1 using a for loop. I want them to extract them as one directly using Xpath.
2) //div[@class='review-text']/text()
yields only
[
'Trying to learn some basic Lithuanian and pictures are very helpful. I'd love to learn more from who created this app..'
]
Awesome :) is missed.
I'm able to get it through BeautifulSoup using soup.select('.quoted-review')[1].getText() directly as one, but not using Xpath.
What wrong am I doing?
lxml?