0

I have this regex line but it's not working perhaps due to newlines? My goal is to extract the passengers name and phone number.

Here is a snippet of the data i have... it's in a loop of 100 of the below:

<div class="booking-section">
    <h4>Passenger Details</h4>
    <p>
        <b>Passenger Name:</b><br />
        Ms Wendy Walker-hunter
    </p>

    <p>
        <b>Mobile Number:</b><br />
        161525961468
    </p>

I'm currently just trying to get passengers name first...

$re = '/(?<=Name)(.*)(?=Mobile)/s';
preg_match($re, $str, $matches);

// Print the entire match result
print_r($matches);

Any kind of help I can get on this is greatly appreciated!

1
  • 3
    You should use a DOM parser to extract this data. You can target each .booking-section element, and list the passenger name as the first <p> tag, and the mobile number as the second. Then you can strip out the <b> and its contents, and the <br />. Don't use regex for this. Commented Feb 20, 2017 at 23:14

2 Answers 2

1

Never parse HTML with a regular expression. Here's how you should be doing this sort of thing:

$html = '<div class="booking-section">
    <h4>Passenger Details</h4>
    <p>
        <b>Passenger Name:</b><br />
        Ms Wendy Walker-hunter
    </p>

    <p>
        <b>Mobile Number:</b><br />
        161525961468
    </p>
</div>
<div class="booking-section">
    <h4>Passenger Details</h4>
    <p>
        <b>Passenger Name:</b><br />
        Mr John Walker
    </p>

    <p>
        <b>Mobile Number:</b><br />
        16153682486
    </p>
</div>
';
libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//div[@class='booking-section']/p[1]/text()[normalize-space()]");
foreach ($results as $node) {
    echo trim($node->textContent) . "\n";
}

This uses an XPath query to get the nodes you're looking for:

//div[@class='booking-section']/p[1]/text()[normalize-space()]

This tells it to select bare text nodes from the first <p> element inside a <div> with class attribute of "booking-section."


According to the documentation:

this function may generate E_WARNING errors when it encounters bad markup. libxml's error handling functions may be used to handle these errors.

I've enabled libxml's internal error handling for this example, to suppress any warnings about the HTML, though of course you should not be outputting warnings to users anyway.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for this but i'm getting nasty errors Warning: DOMDocument::loadHTML(): Misplaced DOCTYPE declaration in Entity, line:
The code as provided works fine for me, are you trying it using the HTML that's above, or the full HTML document?
0

This should work if snippets are always formatted as the example, it relies on the new lines:

$t = '
<div class="booking-section">
  <h4>Passenger Details</h4>
  <p>
    <b>Passenger Name:</b><br />
    Ms Wendy Walker-hunter
  </p>
  <p>
    <b>Mobile Number:</b><br />
    161525961468
  </p>
</div>';

preg_match('/Passenger Name:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $name);

preg_match('/Mobile Number:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $phone);

echo trim($name[1]), ' / ', trim($phone[1]);

Outpus is: Ms Wendy Walker-hunter / 161525961468

Same with preg_match_all:

$t = '
<div class="booking-section">
  <h4>Passenger Details</h4>
  <p>
    <b>Passenger Name:</b><br />
    Ms Wendy Walker-hunter
  </p>
  <p>
    <b>Mobile Number:</b><br />
    161525961468
  </p>
</div>
<div class="booking-section">
  <h4>Passenger Details</h4>
  <p>
    <b>Passenger Name:</b><br />
    Ms Wendy Walker-hunter 2
  </p>
  <p>
    <b>Mobile Number:</b><br />
    161525961468 2
  </p>
</div>
<div class="booking-section">
  <h4>Passenger Details</h4>
  <p>
    <b>Passenger Name:</b><br />
    Ms Wendy Walker-hunter 3
  </p>
  <p>
    <b>Mobile Number:</b><br />
    161525961468 3
  </p>
</div>';

preg_match_all('/Passenger Name:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $name);

preg_match_all('/Mobile Number:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $phone);

echo '<pre>';
print_r($name);
print_r($phone);
die;

Output is something like

Array
(
    [1] => Array
    (
            [0] =>     Ms Wendy Walker-hunter
            [1] =>     Ms Wendy Walker-hunter 2
            [2] =>     Ms Wendy Walker-hunter 3
        )

)
Array
(
    [1] => Array
    (
            [0] =>     161525961468
            [1] =>     161525961468 2
            [2] =>     161525961468 3
        )

)

3 Comments

right, but what if there are more than one listings?
@thevoipman Or what if the whitespace doesn't match perfectly? That's one more reason why you shouldn't parse HTML with regular expressions.
If it is not in a loop as you mentioned, you can use preg_match_all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.