Extract text from string in php

Question

I want to extract any text or string between the following text <p><b> and <div id="t" class="t"> Here is my sample which is not working

$st = '<p><b>Auburn</b> is a city in <a href="/my/id/ala" title="auburn">Lee County</a>, <a href="/my/Alabama" title="Alabama">Alabama</a>, <a href="/my/ph" title="PH">United States</a>. It is the largest city in eastern Alabama with a 2012 population of 56,908.<sup id="test" class="test"><a href="#tst"><span>[</span>2<span>]</span></a></sup> It is a principal city of the <a href="/my/tst" title="Auburn-Opelika Metropolitan Area" class="cs">Auburn-Opelika Metropolitan Area</a>. The <a href="/my/st" title="Auburn-Opelika, AL MSA" class="vf">Auburn-Opelika, AL MSA</a> with a population of 140,247, along with the <a href="/myu/sc" title="Columbus, GA-AL MSA" class="Xd">Columbus, GA-AL MSA</a> and <a href="/my/fd" title="Tuskegee, Alabama">Tuskegee, Alabama</a>, comprises the greater <a href="/my/cdA" title="Columbus-Auburn-Opelika, GA-AL CSA" class="se">Columbus-Auburn-Opelika, GA-AL CSA</a>, a region home to 456,564 residents.</p>
<p>Auburn is a <a href="/my/te" title="College town">college town</a> and is the home of <a href="/my/As" title="Auburn University">Auburn University</a>. Auburn has been marked in recent years by rapid growth, and is currently the fastest growing metropolitan area in Alabama and the nineteenth-fastest growing metro area in the United States since 1990.<sup class="fd" style="white-space:nowrap;">[<i><a href="/my/d" title="fda"><span title="fad (August 2011)">citation needed</span></a></i>]</sup> U.S. News ranked Auburn among its top ten list of best places to live in United States for the year 2009.<sup id="d3" class="f"><a href="3"><span>[</span>3<span>]</span></a></sup> The city`s unofficial nickname is “The Loveliest Village On The Plains,” taken from a line in the poem <i><a href="/my/da" title="The Deserted Village">The Deserted Village</a></i> by <a href="/my/fs" title="Oliver Goldsmith">Oliver Goldsmith</a>: “Sweet Auburn! loveliest village of the plain...”<sup id="ds" class="dsa"><a href="dd"><span>[</span>4<span>]</span></a></sup></p>
<div id="t" class="t">';

preg_match_all('/<p><b>(.*?)<div id="t" class="t">/U', $st, $output);
$result = $output[0];
print_r($output);
echo $result;

h2ooooooo · Accepted Answer · 2013-10-24 16:06:04Z

1

No need for regex here as we're working with literal strings. Just use strpos with offsets:

<?php
    function str_between($string, $searchStart, $searchEnd, $offset = 0) {
        $startPosition = strpos($string, $searchStart, $offset);
        if ($startPosition !== false) {
            $searchStartLength = strlen($searchStart);
            $endPosition = strpos($string, $searchEnd, $startPosition + 1);
            if ($endPosition !== false) {
                return substr($string, $startPosition + $searchStartLength, $endPosition - $searchStartLength);
            }
            return substr($string, $startPosition + $searchStartLength);
        }
        return $string;
    }

    var_dump(str_between($st, '<p><b>', '<div id="t" class="t">'));
?>

DEMO

answered Oct 24, 2013 at 16:06

h2ooooooo

39.6k8 gold badges70 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user1647708 · Accepted Answer · 2013-10-24 18:10:09Z

0

A slight modification will help your regex if you still want to use it rather than the answer by h2ooooooo:

"/s" tells the regex to continue searching beyond the line breaks. Your $st contained line breaks where the regex engine was stopping.

Use following:

preg_match_all('/<p><b>(.*?)<div id="t" class="t">/sU', $st, $output);

answered Oct 24, 2013 at 18:10

user1647708

4573 gold badges10 silver badges22 bronze badges

Collectives™ on Stack Overflow

Extract text from string in php

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related