0

I need to read XML files about 1 GB in size. My XML:

<products>
<product>
<categoryName>Kable i konwertery AV</categoryName>
<brandName>Belkin</brandName>
<productCode>AV10176bt1M-BLK</productCode>
<productId>5616488</productId>
<productFullName>Kabel Belkin Kabel HDMI Ultra HD High Speed 1m-AV10176bt1M-BLK</productFullName>
<productEan>0745883767465</productEan>
<productEuroPriceNetto>59.71</productEuroPriceNetto>
<productFrontendPriceNetto>258.54</productFrontendPriceNetto>
<productFastestSupplierQuantity>23</productFastestSupplierQuantity>
<deliveryEstimatedDays>2</deliveryEstimatedDays>
</product>
<product>
<categoryName>Telewizory</categoryName>
<brandName>Sony</brandName>
<productCode>KDL32WD757SAEP</productCode>
<productId>1005662</productId>
<productFullName>Telewizor Sony KDL-32WD757 SAEP</productFullName>
<productEan></productEan>
<productEuroPriceNetto>412.33</productEuroPriceNetto>
<productFrontendPriceNetto>1785.38</productFrontendPriceNetto>
<productFastestSupplierQuantity>11</productFastestSupplierQuantity>
<deliveryEstimatedDays>6</deliveryEstimatedDays>
</product>
<product>
<categoryName>Kuchnie i akcesoria</categoryName>
<brandName>Brimarex</brandName>
<productCode>1566287</productCode>
<productId>885156</productId>
<productFullName>Brimarex Drewniane owoce, Kiwi - 1566287</productFullName>
<productEan></productEan>
<productEuroPriceNetto>0.7</productEuroPriceNetto>
<productFrontendPriceNetto>3.05</productFrontendPriceNetto>
<productFastestSupplierQuantity>7</productFastestSupplierQuantity>
<deliveryEstimatedDays>3</deliveryEstimatedDays>
</product>
</products>

I use XML reader.

$reader = new XMLReader();
$reader->open($url);
$count = 0;

while($reader->read()) {
    if($reader->nodeType == XMLReader::ELEMENT)
        $nodeName = $reader->name;

    if(($reader->nodeType == XMLReader::TEXT || $reader->nodeType == XMLReader::CDATA)) {

        if ($nodeName == 'categoryName') $categoryName = $reader->value;
        if ($nodeName == 'brandName') $brandName = $reader->value;
        if ($nodeName == 'productCode') $productCode = $reader->value;
        if ($nodeName == 'productId') $productId = $reader->value;
        if ($nodeName == 'productFullName') $productFullName = $reader->value;
        if ($nodeName == 'productEan') $productEan = $reader->value;
        if ($nodeName == 'productEuroPriceNetto') $productEuroPriceNetto = $reader->value;
        if ($nodeName == 'productFastestSupplierQuantity') $productFastestSupplierQuantity = $reader->value;
        if ($nodeName == 'deliveryEstimatedDays') $deliveryEstimatedDays = $reader->value;
    }

    if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'product') {
        $count++;
    }
}
$reader->close();

All is working fine except one problem... When some value is missing, for example <productEan></productEan> in output I am getting a value from the previous, not empty tag till another tag which is not empty.

For example, if previous node is like in example <productEan>0745883767465</productEan> and another two <productEan></productEan> are empty in output array I getting same value, 0745883767465.

What is the right way to solve this problem? Or maybe some one have working solution...

2
  • It may also be worth having a look at stackoverflow.com/questions/1835177/how-to-use-xmlreader-in-php which shows how to read in an entire product item which you can then process as a SimpleXML record ( so $node->productEan) Commented Feb 21, 2019 at 21:52
  • Code suggested by @Nick working fine with smallest xml. But with large XML, I getting out of memory error. So there are issue now... Commented Feb 21, 2019 at 22:55

3 Answers 3

1

Here's some code that will do what you want. It saves the value for each element when it encounters a TEXT or CDATA node, then stores it when it gets to END_ELEMENT. At that time the saved value is set to '', so that if no value is found for an element, it gets an empty string (this could be changed to null if you prefer). It also deals with self-closing tags for example <brandName /> with an isEmptyElement check when a ELEMENT node is found. It takes advantage of PHPs variable variables to avoid the long sequence of if ($nodename == ...) that you have in your code, but also uses an array to store the values for each product, which longer term I think is a better solution for your problem.

$reader = new XMLReader();
$reader->xml($xml);
$count = 0;
$this_value = '';
$products = array();
while($reader->read()) {
    switch ($reader->nodeType) {
        case XMLReader::ELEMENT:
            // deal with self-closing tags e.g. <productEan />
            if ($reader->isEmptyElement) {
                ${$reader->name} = '';
                $products[$count][$reader->name] = '';
            }
            break;
        case XMLReader::TEXT:
        case XMLReader::CDATA:
            // save the value for storage when we get to the end of the element
            $this_value = $reader->value;
            break;
        case XMLReader::END_ELEMENT:
            if ($reader->name == 'product') {
                $count++;
                print_r(array($categoryName, $brandName, $productCode, $productId, $productFullName, $productEan, $productEuroPriceNetto, $productFrontendPriceNetto, $productFastestSupplierQuantity, $deliveryEstimatedDays));
            }
            elseif ($reader->name != 'products') {
                ${$reader->name} = $this_value;
                $products[$count][$reader->name] = $this_value;
                // set this_value to a blank string to allow for empty tags
                $this_value = '';
            }
            break;
        case XMLReader::WHITESPACE:
        case XMLReader::SIGNIFICANT_WHITESPACE:
        default:
            // nothing to do
            break;
    }
}
$reader->close();
print_r($products);

I've omitted the output as it's quite long but you can see the code in operation in this demo on 3v4l.org.

Sign up to request clarification or add additional context in comments.

10 Comments

worked fine some time and suddenly i get an error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 20480 bytes) in ... line ${$reader->name} = $this_value;. I have increased memory size in php.ini up to 2048M I tried set it in my current php file ini_set('memory_limit','2048M'); But nothing help... Where the problem ?
@K.B. it sounds like your input data is too large so you will need to process the data inside the loop instead of storing it. So in the if ($reader->name == 'product)` block you should do all the processing of the data and then (if you are using an array), throw it away by setting $products = array();
Yes this XML up to 1 GB. In first try i have used foreach($products as $product) outside of your script. Then I have tried to move all stuff to if ($reader->name == 'product) block like you suggested, but it does not help, or maybe I miss something. On my local server script working, but on remote server does not work. I can give the link of this XML, maybe you can suggest solution for this issue...
@K.B. so that is still within the reading loop. When you changed to process the data in the loop, did you also get rid of the $products array?
@K.B. glad to hear it. Processing that volume of data can definitely be tricky.
|
1

If instead of using individual values, you store the values in an array of details, you can blank the array out once you have processed each element...

$reader->open($url);
$count = 0;

$data = [];
while($reader->read()) {
    if($reader->nodeType == XMLReader::ELEMENT)
        $nodeName = $reader->name;

        if(($reader->nodeType == XMLReader::TEXT || $reader->nodeType == XMLReader::CDATA)) {
            $data[$nodeName] = $reader->value;
        }

        if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'product') {
            // Process data
            echo ($data['productEan']??"Empty").PHP_EOL;
            // Reset
            $data = [];
            $count++;
        }
}
$reader->close();

which with your test data gives...

0745883767465
Empty
Empty

Comments

0

Reset all variables on each loop. It seems that if you do not assign any value to it, it is getting the previous assigned value.

<?php 
while($reader->read()) {
    $categoryName = 
    $brandName = 
    $productCode = 
    $productId = 
    $productFullName = 
    $productEan = 
    $productEuroPriceNetto = 
    $productFastestSupplierQuantity = 
    $deliveryEstimatedDays = '';
//... code
}
?>

3 Comments

Idea is good to reset variables each loop but i'm stuck... How to do it?
It is on my answer, give an empty value to it just after open the while...
hmmm does not work, no output at all... maybe I'm tired and i need some rest :), but i'm here stuck at all

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.