-6

What is the PHP equivalent for this Perl code?

my $html = '<tr class="aaa"><td class="bbb">111.111.111.111</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr><tr class="aaa"><td class="bbb">222.222.222.222</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr>';

print "$1:$2\n" while $html =~ /class="aaa"><td class="bbb">(.*?)<\/td><td>(\d+)<\/td>/g;

I tried with this code, but it gives infinite loop.

while(preg_match('/td class=\"bbb\">(.*?)<\/td><td>(\d+)<\/td>/',$html,$out)) {
        echo "$out[1]:$out[2]\n";
    }

Also, with if instead of while it gives only one result.

Expected output (IP:PORT):

111.111.111.111:443
222.222.222.222:443

Environment: Windows 7 with PHP 5.5.12 (WAMP v2.5).

15
  • 3
    What have you tried so far to get this going? That might help us understand the problems you're facing :) Commented Jul 26, 2016 at 9:50
  • I tried with this code, but it gives infinite loop: while(preg_match('/td class=\"bbb\">(.*?)<\/td><td>(\d+)<\/td>/',$html,$out)) { echo "$out[1]:$out[2]\n"; } Commented Jul 26, 2016 at 9:55
  • Please edit your question and add the PHP code there. It's hard to read in the comment. Commented Jul 26, 2016 at 9:57
  • Also, with if insted of while it gives only one result. Commented Jul 26, 2016 at 9:57
  • You should look at this question and the answer given, I think that will resolve your issue. Commented Jul 26, 2016 at 10:01

2 Answers 2

2

This code will do as you ask. It uses preg_match_all as simbabque described

<?php

$html = '<tr class="aaa"><td class="bbb">221.86.2.163</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr><tr class="aaa"><td class="bbb">221.86.2.163</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr>';

preg_match_all('|td class="bbb">([\d.]+)</td><td>(\d+)</td>|', $html, $out, PREG_SET_ORDER);

foreach ( $out as $item ) {
    echo "$item[1]:$item[2]\n";
}

?>

output

221.86.2.163:443
221.86.2.163:443
Sign up to request clarification or add additional context in comments.

9 Comments

That's it, this solved my problem. PREG_SET_ORDER is the man, without it doesn't work! :D Thank you.
From the docs: "PREG_SET_ORDER - Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on."
At first, I tried with PHP Simple HTML DOM Parser, I scraped the IPs, but I couldn't ports.
"without [PREG_SET_ORDER] it doesn't work" Yes, it does work without that flag. It's simpler to unpack the contents of $out if you use it, but all the information is in there either way.
@Henders Thank you, I'm new here, I didn't know about 'accepted answer'.
|
0

The PHP function preg_match() returns an integer that indicates if it matched. You are only looking at that return value in your loop, so that condition will always be true. That's why you have an infinite loop.

Since preg_match's $matches only gives you all capture groups from matching once, you only get the first match when used with an if.

The Perl code has the /g modifier on the regular expression match, which makes the match global. The match operator =~ returns a true value for each match. It's basically an iterator, so the while loop will go through all matches without repeating a match, so there is no infinite lop. Then the match variables $1 and $2 are used to display results. You need to use preg_match_all to get a global match in PHP.

You need to first match, then iterate the array with the matches. Since the first element is the full match, you can ignore that.

preg_match_all('/td class=\"bbb\">(.*?)<\/td><td>(\d+)<\/td>/',$html,$out);
for ($i = 1; $i < count($out) - 1; $i += 2) {
    echo "$out[$i]:";
    echo $out[$i+1];
    echo "\n";
}

9 Comments

Nope. I changed echo "$out\n"; to echo "$match\n"; but preg_match gives only the first match with newline between IP and Port. preg_match_all gives error Notice: Array to string conversation bla bla bla.
@tr0in see my update. I need to read phpdoc for syntax because I am not usually using php. The foreach approach was wrong, I didn't see that there are two matches in the regex at first.
That regex is better as, for instance, 'td class="bbb">(.*?)</td><td>(\d+)</td>'
Which regex @Borodin?
@simbabque I see your update, but its same. It gives error: Notice: Array to string conversation bla bla bla. for 3rd and 4th line from your code - $out[$i] and $out[$i+1].
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.