1

I try to get match multiple lines of XML/GML output with preg_match_all() from a WFS service. I receive a bunch of data that is available on a public server for everyone to use. I tried to use the s and m flag, but with little luck. The data I receive looks likes this:

<zwr:resultaat>
  <zwr:objectBeginTijd>2012-09-18</zwr:objectBeginTijd>
  <zwr:resultaatHistorie>
    <zwr:datumInvoeren>2012-10-31</zwr:datumInvoeren>
    <zwr:invoerder>
      <zwr:voornaam>Joep</zwr:voornaam>
      <zwr:achternaam>Koning, de</zwr:achternaam>
      <zwr:email>[email protected]</zwr:email>
      <zwr:telefoon>015-2608166</zwr:telefoon>
      <zwr:organisatie>
        <zwr:bedrijfsnaam>Hoogheemraadschap van Delfland</zwr:bedrijfsnaam>
        <zwr:adres>
          <zwr:huisnummer>32</zwr:huisnummer>
          <zwr:postcode>2611AL</zwr:postcode>
          <zwr:straat>Phoenixstraat</zwr:straat>
          <zwr:woonplaats>DELFT</zwr:woonplaats>
        </zwr:adres>
        <zwr:email>[email protected]</zwr:email>
        <zwr:telefoon>(015) 260 81 08</zwr:telefoon>
        <zwr:website>http://www.hhdelfland.nl/</zwr:website>
      </zwr:organisatie>
    </zwr:invoerder>
  </zwr:resultaatHistorie>
  <zwr:risicoNiveau>false</zwr:risicoNiveau>
  <zwr:numeriekeWaarde>0.02</zwr:numeriekeWaarde>
  <zwr:eenheid>kubieke millimeter per liter</zwr:eenheid>
  <zwr:hoedanigheid>niet van toepassing</zwr:hoedanigheid>
  <zwr:kwaliteitsOordeel>Normale waarde</zwr:kwaliteitsOordeel>
  <zwr:parameterGrootheid>
    <zwr:grootheid>Biovolume per volume eenheid</zwr:grootheid>
    <zwr:object>Microcystis</zwr:object>
  </zwr:parameterGrootheid>
  <zwr:analyseProces>
    <zwr:analyserendeInstantie>AQUON</zwr:analyserendeInstantie>
  </zwr:analyseProces>
</zwr:resultaat>

An example of the data can also be found at: http://212.159.219.98/zwr-ogc/services?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetGmlObject&OUTPUTFORMAT=text%2Fxml%3B+subtype%3Dgml%2F3.1.1&TRAVERSEXLINKDEPTH=0&GMLOBJECTID=ZWR_MONSTERPUNT_304427

It is all in Dutch but that should not matter for the context of the question. The case is that I would like to search multiple lines of this code and get the values between tags. I also tried to read it all out separately (which worked out fine), but because there are multiple combinations of tags (sometimes a tag will be used or not), this mixes up the data I receive and there is no structure in the fetched data.

I thought it would be a good idea to read a whole set of tags so that I can keep the data together. The current preg_match_all() code I have is :

preg_match_all("/<zwr:risicoNiveau>(.*)<\/zwr:risicoNiveau><zwr:numeriekeWaarde>(.*)<\/zwr:numeriekeWaarde><zwr:eenheid>(.*)<\/zwr:eenheid><zwr:hoedanigheid>(.*)<\/zwr:hoedanigheid>
    <zwr:kwaliteitsOordeel>(.*)<\/zwr:kwaliteitsOordeel><zwr:parameterGrootheid><zwr:object>(.*)<\/zwr:object><zwr:grootheid>(.*)<\/zwr:grootheid><\/zwr:parameterGrootheid>/m", $content, $stof);

So as you can see I would like to read multiple values from one preg_match_all(), this will give me an array with multiple array's in it.

How do I read multiple tags after each other (which are on different lines?)? When I use a var_dump() to show all the data, it shows me a multidimensional array with no data in it. The s and m flags do not work for me? Am I doing something wrong? Other methods in PHP are welcome!

3
  • Why not using a XML parser? Commented Jan 19, 2016 at 10:25
  • @Toto XML parse didn't work for me. The namespaces are a pain the a**. Thanks for your response! Commented Jan 19, 2016 at 13:24
  • The gml tag is for the Game Maker Language, see this meta post. Commented Jan 19, 2016 at 15:07

1 Answer 1

1

1.) You need to add whitespace \s in between tags.
<\/zwr:risicoNiveau> \s* <zwr:numeriekeWaarde>...

2.) Further use .*? inside your capture groups for matching non greedy.
<zwr:risicoNiveau>(.*?)<\/zwr:risicoNiveau>

3.) Improve regex readability by use of x flag (free spacing mode).
Regex demo at regex101

Note: Use exclusion ([^<]*?) rather than (.*?) for forcing the format like this. To match the remaining tags, use optional quantifier ? on optional tags like this with optional <zwr:object>

$pattern = '~
<zwr:risicoNiveau>(.*?)</zwr:risicoNiveau>\s*
<zwr:numeriekeWaarde>(.*?)</zwr:numeriekeWaarde>\s*
<zwr:eenheid>(.*?)</zwr:eenheid>\s*
<zwr:hoedanigheid>(.*?)</zwr:hoedanigheid>\s*
<zwr:kwaliteitsOordeel>(.*?)</zwr:kwaliteitsOordeel>\s*
<zwr:parameterGrootheid>\s*
  <zwr:grootheid>(.*?)</zwr:grootheid>\s*
  <zwr:object>(.*?)</zwr:object>\s*
</zwr:parameterGrootheid>
~sx';

PREG_SET_ORDER Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on... read more in the PHP MANUAL

if(preg_match_all($pattern, $str, $out, PREG_SET_ORDER) > 0)
  print_r($out);

See php demo at eval.in

Sign up to request clarification or add additional context in comments.

3 Comments

Sir, you just helped me tremendously! Thank you so much!
So.. The 'problem' now is that when the first part of the pattern matches, but the last part doesn't, it will continue to fetch information until it finds the last part. This happens because the content in <zwr:parameterGrootheid> does not always use the same tags. Is there a way to just look for this pattern specifically and if it doesn't match it won't be put in array?
@RoyanPonder Exclude <. Change (.*?) to ([^<]*?) like this demo. If you still want to match the items, that are available: Make the tags, that are not always present optional like this demo where I made <zwr:object> tags optional by putting in (?: non captur group ) with a ? quantifier for zero or one time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.