0

I am new to php, and I have close to 10GB xml file to import to mysql database. The xml file is heavily nested. What I intend to extract is some information, and not to import the whole xml file. When I ran my php code, the result was blank. My php code is this:

<?php
error_reporting(-1);
ini_set('display_errors', true);

function get_reader($file){
$reader = new XMLReader;
$reader->open($file);
return $reader;
}

function handle_Entity(SimpleXMLElement $Entity){
/*
This gets called everytime an album node
has been iterated.
*/
printf(
"(%d) %s - %s\
",
$album->N2,
$album->N5,
$album->N9
);
}

$xml = get_reader('companies_xml_extract_20170703_1.xml');

while($xml->read()){
$isNewAlbum = 'NameElement' === $xml->name && $xml->nodeType === 
XMLReader::ELEMENT;
if($isNewAlbum){
$doc = new DOMDocument('1.0', 'UTF-8');
handle_Entity(
simplexml_import_dom($doc->importNode($xml->expand(), true))
);
}
}

From the dummy file, paths to this info are: OrganisationName = N8:EntityList/N8:Entity/N2:OrganisationName/N2:NameElement CompanyID = N8:EntityList/N8:Entity/N5:Identifiers/N5:Identifier/N5:IdentifierElement UltimateHoldingCompanyName = N8:EntityList/N8:Entity/N9:UltimateHoldingCompany/N2:OrganisationName/N2:NameElement

Find attached the dummy xml file: my xml file

At the end, my expectation is to print "UltimateHoldingCompanyName","OrganisationName","NameElement"

Thanks

1
  • I don't know much abot XML, but your sample file seems to lack the necessary declarations (like <?xml version="1.0"?> ... to make it valid XML. Also your name spaces xlmns:... aren't defined. Commented Sep 24, 2017 at 11:27

2 Answers 2

1

If the file is that large, then SimpleXML isn't much use because it needs to load the entire file into memory. Instead, you should use a pull parser like XMLReader

Sign up to request clarification or add additional context in comments.

Comments

0

As you don't give us even enough of the XML to be able to fetch all of the data your after, I've only managed to construct something which gets one of the pieces of data.

The one thing is that when using $reader->name it will include the namespace, so as in this code, you have to put the full name as it appears in the document.

<?php
error_reporting ( E_ALL );
ini_set ( 'display_errors', 1 );

$reader = new XMLReader();
$reader->open("companies_xml_extract_20170703_1.xml");
$fo = fopen("companies.csv", "w" );
fputs($fo, "name, id, ultimateHoldingCompany".PHP_EOL);
while ( $reader->read())    {
    if ( $reader->name == 'N8:Entity' &&
            $reader->nodeType === XMLReader::ELEMENT )    {
                $name = null;
                $ultimateHoldingCompany = null;
                $id = null;
                $newNode = $reader->expand();
                $nameNode = $newNode->getElementsByTagName('OrganisationName');
                if ( $nameNode->length > 0 ){
                    $name = $nameNode[0]->getElementsByTagName('NameElement')->item(0)->nodeValue;
                }
                $nameNode = $newNode->getElementsByTagName('UltimateHoldingCompany');
                if ( $nameNode->length > 0 ){
                    $nameElement = $nameNode[0]->getElementsByTagName('NameElement');
                    if ( $nameElement->length > 0 ) {
                        $ultimateHoldingCompany = $nameElement[0]->nodeValue;
                    }
                }
                $idNode = $newNode->getElementsByTagName('IdentifierElement');
                if ( $idNode->length > 0 ){
                    $id = $idNode[0]->nodeValue;
                }

                fputs($fo,  $name.",".$id.",".$ultimateHoldingCompany.PHP_EOL);
            }
}
fclose($fo);

8 Comments

Hi Nigel, thank you. I have now uploaded all of the data that I am after. I then ran your code but I get this result:and here is the result. Warning: XMLReader::read(): file:/C:/xampp/htdocs/katalyst/stackoverflow.xml:1: parser error : XML declaration allowed only at the start of the document in C:\xampp\htdocs\katalyst\katalystSample.php on line 7 Warning: XMLReader::read(): <?xml version="1.0" encoding="utf-8"?> in C:\xampp\htdocs\katalyst\katalystSample.php on line 7 Warning: XMLReader::read(): ^ in C:\xampp\htdocs\katalyst\katalystSample.php on line 7. I ll be glad for your help.
I've updated the code to extract the other parts you wanted. The error your getting though is that there are a few spaces at the start of your XML sample file. Ensure that the <?xml is at the start of the line.
Hi Nigel, many thanks. Your code works fine, but when applied to larger xml file with more records, I got this error: "Fatal error: Uncaught Error: Call to a member function getElementsByTagName() on null ", particularly on the line of code "$ultimateHoldingCompany = $nameNode->getElementsByTagName('NameElement')->item(0)->nodeValue;". I know probably it is because the attribute ultimateHoldingCompany is not present in some records. here is xml file with null attributes in some records (xml file). still need help.
I've added some extra checking, so at each stage it checks if the element exists before trying to fetch a value from it. You may need to break this down even further, but the logic is still the same - look for any elements by name, if there are some, then use the value.
Hi Nigel, thanks. We're almost there. However, there are there issues to be fixed: (1) An error "Notice: Trying to get property of non-object in C:\xampp\htdocs\katalyst.php on line 20", (2) All output seem to be on the same line. Whereas, each record suppose to be on separate lines under column headers. For example (header in bold): name, id, ultimateHoldingCompany (end of line) RELEASE WOF LIMITED ,6464273, Null (end of line) DOOLEE LIMITED, 601291123, Doolee Construction (end of line). (3) export it to csv. How can I go about these? I tried but got error. Thanks for help.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.