5

pls. I have problem using regexp expresion in the following php function:

$xml1 = "<arg1>
        <S113-03>1</S113-03>
        <S184-06>1</S184-06>
    </arg1>";

$xml2 = "<arg1>
        <P055>1</P055>
        <P096>1</P096>
    </arg1>";

function xml2array($xml) {
     $xmlArray = array();
     $regexp = "/<(\w+)\s*([^\/>]*)\s*(?:\/>|>(.*)<\/\s*\\1\s*>)/s";
     preg_match_all($regexp, $xml, $elements);

     foreach ($elements[1] as $ie => $element) {
         if (preg_match($regexp, $elements[3][$ie]))
             $xmlArray[$element] = xml2array($elements[3][$ie]);
         else {
             $xmlArray[$element] = trim($elements[3][$ie]);
         }
     }
return $xmlArray;
}

$array = xml2array($xml1);
echo print_r($array, true);

while $xml2 gives me result (it is OK):

Array
(
    [arg1] => Array
        (
            [P055] => 1
            [P096] => 1
        )

)

while $xml1 gives me result (wrong):

Array
(
    [arg1] => <S113-03>1</S113-03>
            <S184-06>1</S184-06>
)

I believe the problem is in regexp, but its content seems to be chinesse tea for me

2
  • 1
    Don't use regex. There are answers giving you better options... which are plentiful. Commented Apr 22, 2015 at 10:11
  • Thanks to all. In fact the function is a part of a stock import interface in Magento and I was not sure to replace the original regexp with SimpleXMLElement because I do not understand the reason of using regexp there. anyway thanks for both alternatives. Commented Apr 22, 2015 at 10:47

3 Answers 3

4

You know Chuck Norris?

Chuck Norris can parse HTML with RegExp.

Anyway here with go without RegExp:

Demo

<?php

$xml1 = "<arg1>
        <S113-03>1</S113-03>
        <S184-06>1</S184-06>
    </arg1>";

$xml2 = "<arg1>
        <P055>1</P055>
        <P096>1</P096>
    </arg1>";

function xml2array($xmlString)
{
    $xml   = simplexml_load_string($xmlString, 'SimpleXMLElement', LIBXML_NOCDATA);
    return json_decode(json_encode((array)$xml), TRUE);
}

var_dump(xml2array($xml1));
var_dump(xml2array($xml2));

Output:

array(2) {
  ["S113-03"]=>
  string(1) "1"
  ["S184-06"]=>
  string(1) "1"
}
array(2) {
  ["P055"]=>
  string(1) "1"
  ["P096"]=>
  string(1) "1"
}
Sign up to request clarification or add additional context in comments.

3 Comments

agreed. chuck norris can parse html/xml with regex, childs play
the point being, only chuck norris can pull it off. Everyone else should stay well away from regexp for xml/html parsing
@tucuxi: I doubt Chuck Norris understands the word "tag". I can use regexes, I can (?:read|write) them. If I need to parse out a simple .?(?=ML) string from a known source, I will go for it!
2

Use this fix, note the updated (\w+) that is now ([\w-]+):

$regexp = "/<([\w-]+)\s*([^\/>]*)\s*(?:\/>|>(.*)<\/\s*\\1\s*>)/s";

The result is

Array                                                                                                                                                                                                                                                  
(                                                                                                                                                                                                                                                      
    [arg1] => Array                                                                                                                                                                                                                                    
        (                                                                                                                                                                                                                                              
            [S113-03] => 1                                                                                                                                                                                                                             
            [S184-06] => 1                                                                                                                                                                                                                             
        )                                                                                                                                                                                                                                              

) 

Here is the sample code.

2 Comments

@Downvoter: What is wrong with the regex fix? When you downvote a working solution, you'll end up with no rep at all sooner or later.
I understand people do not like regex approach when they see rich texts, but I just suggest a fix to the current code. I am not imposing a regex solution to anyone.
2

It would be easier and quicker (more memory-wise) to use PHP SimpleXML functionality.

$xml1 = "<arg1>
        <S113-03>1</S113-03>
        <S184-06>2</S184-06>
    </arg1>";

$xml2 = "<arg1>
        <P055>3</P055>
        <P096>4</P096>
    </arg1>";

var_dump(new \SimpleXMLElement($xml1));
var_dump(new \SimpleXMLElement($xml2));

dumps:

php test.php
class SimpleXMLElement#1 (2) {
  public $S113-03 =>
  string(1) "1"
  public $S184-06 =>
  string(1) "2"
}
class SimpleXMLElement#1 (2) {
  public $P055 =>
  string(1) "3"
  public $P096 =>
  string(1) "4"
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.