2

I want to extract any text or string between the following text <p><b> and <div id="t" class="t"> Here is my sample which is not working

$st = '<p><b>Auburn</b> is a city in <a href="/my/id/ala" title="auburn">Lee County</a>, <a href="/my/Alabama" title="Alabama">Alabama</a>, <a href="/my/ph" title="PH">United States</a>. It is the largest city in eastern Alabama with a 2012 population of 56,908.<sup id="test" class="test"><a href="#tst"><span>[</span>2<span>]</span></a></sup> It is a principal city of the <a href="/my/tst" title="Auburn-Opelika Metropolitan Area" class="cs">Auburn-Opelika Metropolitan Area</a>. The <a href="/my/st" title="Auburn-Opelika, AL MSA" class="vf">Auburn-Opelika, AL MSA</a> with a population of 140,247, along with the <a href="/myu/sc" title="Columbus, GA-AL MSA" class="Xd">Columbus, GA-AL MSA</a> and <a href="/my/fd" title="Tuskegee, Alabama">Tuskegee, Alabama</a>, comprises the greater <a href="/my/cdA" title="Columbus-Auburn-Opelika, GA-AL CSA" class="se">Columbus-Auburn-Opelika, GA-AL CSA</a>, a region home to 456,564 residents.</p>
<p>Auburn is a <a href="/my/te" title="College town">college town</a> and is the home of <a href="/my/As" title="Auburn University">Auburn University</a>. Auburn has been marked in recent years by rapid growth, and is currently the fastest growing metropolitan area in Alabama and the nineteenth-fastest growing metro area in the United States since 1990.<sup class="fd" style="white-space:nowrap;">[<i><a href="/my/d" title="fda"><span title="fad (August 2011)">citation needed</span></a></i>]</sup> U.S. News ranked Auburn among its top ten list of best places to live in United States for the year 2009.<sup id="d3" class="f"><a href="3"><span>[</span>3<span>]</span></a></sup> The city`s unofficial nickname is “The Loveliest Village On The Plains,” taken from a line in the poem <i><a href="/my/da" title="The Deserted Village">The Deserted Village</a></i> by <a href="/my/fs" title="Oliver Goldsmith">Oliver Goldsmith</a>: “Sweet Auburn! loveliest village of the plain...”<sup id="ds" class="dsa"><a href="dd"><span>[</span>4<span>]</span></a></sup></p>
<div id="t" class="t">';

preg_match_all('/<p><b>(.*?)<div id="t" class="t">/U', $st, $output);
$result = $output[0];
print_r($output);
echo $result;

2 Answers 2

1

No need for regex here as we're working with literal strings. Just use strpos with offsets:

<?php
    function str_between($string, $searchStart, $searchEnd, $offset = 0) {
        $startPosition = strpos($string, $searchStart, $offset);
        if ($startPosition !== false) {
            $searchStartLength = strlen($searchStart);
            $endPosition = strpos($string, $searchEnd, $startPosition + 1);
            if ($endPosition !== false) {
                return substr($string, $startPosition + $searchStartLength, $endPosition - $searchStartLength);
            }
            return substr($string, $startPosition + $searchStartLength);
        }
        return $string;
    }

    var_dump(str_between($st, '<p><b>', '<div id="t" class="t">'));
?>

DEMO

Sign up to request clarification or add additional context in comments.

Comments

0

A slight modification will help your regex if you still want to use it rather than the answer by h2ooooooo:

"/s" tells the regex to continue searching beyond the line breaks. Your $st contained line breaks where the regex engine was stopping.

Use following:

preg_match_all('/<p><b>(.*?)<div id="t" class="t">/sU', $st, $output);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.