I am not good at writing pattern to extract data. I have long document, and below is the specific string that I need to extract.
<p><span id="minPrice">XXXX<a href="YYYYY" target="_blank"><span>¥ZZZZZ</span></a></span>
I want to extract XXXX, YYYY, and ZZZZ value.
My first step is to get XXXX<a href="YYYYY" target="_blank"><span>¥ZZZZZ
$pattern = '/<p><span id="minPrice">^</span></a></span>/';
preg_match($pattern, $data, $matches);
echo ($matches[1]);
But it does not work.
So how to extract XXXX, YYYY, and ZZZZ :(
the document that i have is full of error encoding chars so that I can not use loadHTML. It just returns error.
UPDATE 1: So I am able to do
var_dump(libxml_use_internal_errors(true));
$DOM = new DOMDocument;
$DOM->loadHTML($data);
$items = $DOM->getElementById('minPrice');
And $items is
DOMElement Object
(
[tagName] => span
[schemaTypeInfo] =>
[nodeName] => span
[nodeValue] => 最安価格(税込):¥131,649
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => span
[baseURI] =>
[textContent] => 最安価格(税込):¥131,649
)
The html is
<span id="minPrice">
�ň����i(�ō�)�F
<a href="http://kakaku.com/shop/1115/?pdid=K0000693648&lid=shop_itemview_saiyasukakaku" target="_blank">
<span>¥131,649</span>
</a>
</span>
How can I extract http://kakaku.com/shop/1115/?pdid=K0000693648&lid=shop_itemview_saiyasukakaku and 131,649 ?
libxml_use_internal_errors(true);when reading the HTML in?