0

I am learning php language. I want to show the table of contents for the article. Convert the headings (h2,h3,h4,...) into a list and create links. This is my php code.

$Post = '
<h2>Title 01</h2>
<h3>Title 01.01</h3>
<h3>Title 01.02</h3>
<h2>Title 02</h2>
<h3>Title 02.02</h3>
';

$c = 1;
$r = preg_replace_callback('~<h*([^>]*)>~i', function($res) use (&$c){
    return '<li><a id="#id'.$c++.'">'.$res[1].'</a></li>';
}, $Post);
$Post = $r;


echo '<ul>';
echo $Post;
echo '</ul>';

The output shows as below, but the above code works wrongly.

<ul>
<li><a id="#id1">2</a></li>Title 01<li><a id="#id2">/h2</a></li>
<li><a id="#id3">3</a></li>Title 01.01<li><a id="#id4">/h3</a></li>
<li><a id="#id5">3</a></li>Title 01.02<li><a id="#id6">/h3</a></li>
<li><a id="#id7">2</a></li>Title 02<li><a id="#id8">/h2</a></li>
<li><a id="#id9">3</a></li>Title 02.02<li><a id="#id10">/h3</a></li>
</ul>

I know that the PHP code is written incorrectly.‌ But i want to show the output as below.

<ul>
<li><a href="#id1">Title 01</a></li>
<li><a href="#id2">Title 01.01</a></li>
<li><a href="#id3">Title 01.02</a></li>
<li><a href="#id4">Title 02</a></li>
<li><a href="#id5">Title 02.02</a></li>
</ul>
1

2 Answers 2

2

Your regular expression is needlessly complex.

You could just use <h.>(.*)</h.> to correctly match what you are trying to match.

I added it to your snippet above to show your desired result:

$post = '
<h2>Title 01</h2>
<h3>Title 01.01</h3>
<h3>Title 01.02</h3>
<h2>Title 02</h2>
<h3>Title 02.02</h3>
';

$c = 1;
$list_elements = preg_replace_callback('~<h.>(.*)</h.>~i', function($res) use (&$c){
    return '<li><a id="#id'.$c++.'">'.$res[1].'</a></li>';
}, $post);


echo '<ul>';
echo $list_elements;
echo '</ul>';

Although, as suggested in the comments, you should probably use a parser here, if this turns into anything more than a toy example. Then regular expressions are almost always a sure way to shoot yourself in the foot.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 Much simpler pattern than what I start with - but as you say, handling more complex markup will need (much) more complex patterns
1

Your regex is wrong for what you're trying to do:

~<h*([^>]*)>~i

<h* means that it will match an angle bracket followed by zero or more h's. Which basically means your regex is matching everything between each <> pairing, (including </...>).

You could do this to extract the titles from your headings:

~<h[1-6]>([^<]*)<\h[1-6]>~i

But those linked need to target the IDs in the headings, so you need to do this to extract them:

~<h[1-6] id="([^"]*)">([^<]*)<\h[1-6]>~i

But what if you've got other attributes on the heading?

~<h[1-6][^>]*(id="([^"*])"[^>]*)?>([^<]*)<\h[1-6]>~i

Or markup inside the heading?


Regex is not a great way to parse HTML. It is a powerful tool, and it is possible to use it for this, but there are better ways.

$doc = new DOMDocument();
$doc->loadHTML($post);

$xpath = new DOMXPath($doc);

$headings = $xpath->query('html/body//*[self::h1 or self::h2 or self::h3]');

$nav = $xpath->query('html/body//nav/ul');

foreach ($headings as $heading) {
  $link = $doc->createElement('a');
  $link->setAttribute('href', '#' . $heading->getAttribute('id'));
  $link->textContent = $heading->textContent;

  $nav->appendChild(
   $doc->createElement('li')
     ->appendChild($link)
  );
}

I've assumed there is no markup in the headings, but only a couple of changes are needed to copy inner markup if necessary.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.