6

I have researched a lot to convert an xml file to 2d array in a same way excel does trying to make same algorithm as excel does when you open an xml file in excel.

<items>
    <item>
        <sku>abc 1</sku>
        <title>a book 1</title>
        <price>42 1</price>
        <attributes>
            <attribute>
                <name>Number of pages 1</name>
                <value>123 1</value>
            </attribute>
            <attribute>
                <name>Author 1</name>
                <value>Rob dude 1</value>
            </attribute>
        </attributes>
        <contributors>
            <contributor>John 1</contributor>
            <contributor>Ryan 1</contributor>
        </contributors>
        <isbn>12345</isbn>
    </item>
    <item>
        <sku>abc 2</sku>
        <title>a book 2</title>
        <price>42 2</price>
        <attributes>
            <attribute>
                <name>Number of pages 2</name>
                <value>123 2</value>
            </attribute>
            <attribute>
                <name>Author 2</name>
                <value>Rob dude 2</value>
            </attribute>
        </attributes>
        <contributors>
            <contributor>John 2</contributor>
            <contributor>Ryan 2</contributor>
        </contributors>
        <isbn>6789</isbn>
     </item>
</items>

I want it to convert it to to 2-dimensional array like if you open the same file in Excel it will show you like this

enter image description here


I want to convert to 2-dimensional array just like Excel does. So far I can extract the labels like Excel does

function getColNames($array) {
    $cols   = array();
    foreach($array as $key=>$val) {
        if(is_array($val)) {
            if($val['type']=='complete') {
                if(in_array($val['tag'], $cols)) {

                } else {
                    $cols[] = $val['tag'];
                }
            }
         }
    }
    return $cols;
}

$p = xml_parser_create();
xml_parse_into_struct($p, $simple, $vals, $index);
xml_parser_free($p);

Goal

I want to have it generate like this..

array (
    0 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'name'=>'Number of Pages 1',
        'value'=>'123 1',
        'isbn'=>12345
    ),
    1 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'name'=>'Author 1',
        'value'=>'Rob dude 1',
        'isbn'=>12345
    ),
    2 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'contributor'=>'John 1',
        'isbn'=>12345
    ),
    3 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'contributor'=>'Ryan 1',
        'isbn'=>12345
    ),
)

Sample 2 XML..

 <items>
    <item>
       <sku>abc 1</sku>
       <title>a book 1</title>
       <price>42 1</price>
       <attributes>
          <attribute>
              <name>Number of pages 1</name>
              <value>123 1</value>
          </attribute>
          <attribute>
              <name>Author 1</name>
              <value>Rob dude 1</value>
          </attribute>
       </attributes>
       <contributors>
          <contributor>John 1</contributor>
          <contributor>Ryan 1</contributor>
       </contributors>
       <isbns>
            <isbn>12345a</isbn>
            <isbn>12345b</isbn>
       </isbns>
    </item>
    <item>
       <sku>abc 2</sku>
       <title>a book 2</title>
       <price>42 2</price>
       <attributes>
          <attribute>
              <name>Number of pages 2</name>
              <value>123 2</value>
          </attribute>
          <attribute>
              <name>Author 2</name>
              <value>Rob dude 2</value>
          </attribute>
       </attributes>
       <contributors>
          <contributor>John 2</contributor>
          <contributor>Ryan 2</contributor>
       </contributors>
       <isbns>
            <isbn>6789a</isbn>
            <isbn>6789b</isbn>
       </isbns>
    </item>
    </items>

Sample 3 XML..

<items>
<item>
   <sku>abc 1</sku>
   <title>a book 1</title>
   <price>42 1</price>
   <attributes>
      <attribute>
          <name>Number of pages 1</name>
          <value>123 1</value>
      </attribute>
      <attribute>
          <name>Author 1</name>
          <value>Rob dude 1</value>
      </attribute>
   </attributes>
   <contributors>
      <contributor>John 1</contributor>
      <contributor>Ryan 1</contributor>
   </contributors>
   <isbns>
        <isbn>
            <name>isbn 1</name>
            <value>12345a</value>
        </isbn>
        <isbn>
            <name>isbn 2</name>
            <value>12345b</value>
        </isbn>
   </isbns>
</item>
<item>
   <sku>abc 2</sku>
   <title>a book 2</title>
   <price>42 2</price>
   <attributes>
      <attribute>
          <name>Number of pages 2</name>
          <value>123 2</value>
      </attribute>
      <attribute>
          <name>Author 2</name>
          <value>Rob dude 2</value>
      </attribute>
   </attributes>
   <contributors>
      <contributor>John 2</contributor>
      <contributor>Ryan 2</contributor>
   </contributors>
   <isbns>
        <isbn>
            <name>isbn 3</name>
            <value>6789a</value>
        </isbn>
        <isbn>
            <name>isbn 4</name>
            <value>6789b</value>
        </isbn>
   </isbns>
</item>
</items>
12
  • 2
    i can't understand your question (problem) Commented Sep 23, 2014 at 10:14
  • Well i want to convert the above xml to 2d array the way excel does Commented Sep 23, 2014 at 10:16
  • you want array like this ?? array ('sku'=> array('abc1','abc2') .... ) Commented Sep 23, 2014 at 10:27
  • Please check i have edited my post and posted expected result Commented Sep 23, 2014 at 11:51
  • 1
    Use PHPExcel for reading from Excel files: github.com/PHPOffice/PHPExcel/wiki/… Commented Sep 25, 2014 at 13:25

3 Answers 3

3
+25

According to your vague question, what you call "Excel" it does the following in my own words: It takes each /items/item element as a row. From that in document order, the column-name is the tag-name of each leaf-element-nodes, if there is a duplicate name, the position is of the first one.

Then it creates one row per row but only if all child-elements are leaf elements. Otherwise, the row is taken as base for the rows out of that row and non-leaf-element containing elements are interpolated. E.g. if such an entry does have two times two additional leafs with the same name, those get interpolated into two rows. Their child values are then placed into the position of the columns with the name following the logic described in the first paragraph.

How deep this logic is followed is not clear from your question. So I keep it on that level only. Otherwise the interpolation would need to recurse deeper into the tree. For that, the algorithm as outlined might not be fitting any longer.

To build that in PHP, you can particularly benefit from XPath and the interpolation works wonders as a Generator.

function tree_to_rows(SimpleXMLElement $xml)
{
    $columns = [];

    foreach ($xml->xpath('/*/*[1]//*[not(*)]') as $leaf) {
        $columns[$leaf->getName()] = null;
    }

    yield array_keys($columns);

    $name = $xml->xpath('/*/*[1]')[0]->getName();

    foreach ($xml->$name as $source) {
        $rowModel       = array_combine(array_keys($columns), array_fill(0, count($columns), null));
        $interpolations = [];

        foreach ($source as $child) {
            if ($child->count()) {
                $interpolations[] = $child;
            } else {
                $rowModel[$child->getName()] = $child;
            }
        }

        if (!$interpolations) {
            yield array_values($rowModel);
            continue;
        }

        foreach ($interpolations as $interpolation) {
            foreach ($interpolation as $interpolationStep) {
                $row = $rowModel;
                foreach ($interpolationStep->xpath('(.|.//*)[not(*)]') as $leaf) {
                    $row[$leaf->getName()] = $leaf;
                }
                yield array_values($row);
            }
        }
    }
}

Using it then can be as straight forward as:

$xml  = simplexml_load_file('items.xml');
$rows = tree_to_rows($xml);
echo new TextTable($rows);

Giving the exemplary output:

+-----+--------+-----+-----------------+----------+-----------+-----+
|sku  |title   |price|name             |value     |contributor|isbn |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |Number of pages 1|123 1     |           |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |Author 1         |Rob dude 1|           |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |                 |          |John 1     |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |                 |          |Ryan 1     |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |Number of pages 2|123 2     |           |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |Author 2         |Rob dude 2|           |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |                 |          |John 2     |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |                 |          |Ryan 2     |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+

The TextTable is a slightly modified version from https://gist.github.com/hakre/5734770 allowing to operate on Generators - in case you're looking for that code.

Sign up to request clarification or add additional context in comments.

Comments

0

In order to get the array that you want from the xml file you have given you would have to do it this way. This was not overly fun so I hope it is indeed what you wanted.

Given the exact XML you have given about it will produce the output you have as your final result.

This was written in php 5.6 I believe you will have to move the function calls to their own line and replace [] with array() if you run into issues in your environment.

$items = simplexml_load_file("items.xml");

$items_array = [];

foreach($items as $item) {

    foreach($item->attributes->attribute as $attribute) {
        array_push($items_array, itemsFactory($item, (array) $attribute));
    }

    foreach((array) $item->contributors->contributor as $contributer) {
        array_push($items_array, itemsFactory($item, $contributer));
    }

}

function itemsFactory($item, $vars) {

    $item = (array) $item;

    return [
        "sku" => $item['sku'],
        "title" => $item['title'],
        "price" => $item['price'],
        "name" => (is_array($vars) ? $vars['name'] : ""),
        "value" => (is_array($vars) ? $vars['name'] : ""),
        "contributer" => (is_string($vars) ? $vars : ""),
        "isbn" => $item['isbn']
    ];

}

var_dump($items_array);

Here is the result when run on your XML file...

array(8) {
  [0]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(17) "Number of pages 1"
    ["value"]=>
    string(17) "Number of pages 1"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(5) "12345"
  }
  [1]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(8) "Author 1"
    ["value"]=>
    string(8) "Author 1"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(5) "12345"
  }
  [2]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "John 1"
    ["isbn"]=>
    string(5) "12345"
  }
  [3]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "Ryan 1"
    ["isbn"]=>
    string(5) "12345"
  }
  [4]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(17) "Number of pages 2"
    ["value"]=>
    string(17) "Number of pages 2"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(4) "6789"
  }
  [5]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(8) "Author 2"
    ["value"]=>
    string(8) "Author 2"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(4) "6789"
  }
  [6]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "John 2"
    ["isbn"]=>
    string(4) "6789"
  }
  [7]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "Ryan 2"
    ["isbn"]=>
    string(4) "6789"
  }
}

If you actually have access to the excel file and not the xml this could be much easier. If so we can use php excel to render the exact same thing but it would work for any dataset and not just the one specified. If that is not the case I can't think of any other way to transform that xml file into what you want.

EDIT:

This also may bring some more light to the subject and is from the developer of PHPExcel himself PHPExcel factory error when reading XML from URL. As you can I don't think you are able to write something that would parse any XML file that you throw at it without getting a hold of some of Excels source code or spending a very long time working on this.. time that is much beyond the scope of this question. However if you were to write something that would parse any XML file I have a feeling it would look like the above but with a TON of conditionals.

5 Comments

it should be generic and should read any xml just like excel does when reading XML.
lol. you can write that yourself. I have provided a very clear example of what you need to do. and an explanation of why this is extremely hard to do. You are asking in this question for someone to write an entire library. You will not get the answer you want and you did not say that in your question which is extremely poorly worded. If you want to pay me I can write it for you but something of that scope I am not doing for free.
To further show case this. Imagine writing something that would parse this enetpulse.com/wp-content/uploads/…
@mschuett: For a more generic XML handling (without attributes(!)) I went for this route: stackoverflow.com/a/26087574/367456 - in case you're still interested.
Yes without attributes is the key as with would get insane very fast. Nice work.
0

The PHP library PHPExcel solves your issue:

https://phpexcel.codeplex.com/

You can find some samples here too:

https://phpexcel.codeplex.com/wikipage?title=Examples&referringTitle=Home

https://github.com/PHPOffice/PHPExcel/wiki/User%20Documentation

It's the most reliable Excel library for PHP and it's constantly maintained and upgraded.

Keep in mind that you can read (from an Excel file etc.) and write (to an Excel file, PDF etc.).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.