1

I am trying to make myself a BBCODE parser in PHP.

Now I have the following Regex:

\[quote\](.*?)\[\/quote\]

This should replace with:

<div class='quote'><div class='quotetext'>$1</div></div>

This works all perfect until i have a "multidimentional" post Example:

[quote] [quote] [quote] text [/quote] [/quote] [/quote]

This should have the following outcome:

<div class='quote'><div class='quotetext'>
      <div class='quote'><div class='quotetext'>
           <div class='quote'><div class='quotetext'>
           text
           </div></div>
      </div></div>
</div></div>

Right now it gets the following outcome:

<div class='quote'><div class='quotetext'> [quote] [quote] text </div></div> [/quote] [/quote]

Php:

preg_replace("/\[quote\](.*?)\[\/quote\]/", "<div class='quote'><div class='quotetext'>$1</div></div>", $text); 

I hope someone could help me with this issue. Thanks

4
  • Where is the PHP code that does this? Can you add this to your question? Commented Sep 15, 2021 at 14:42
  • Um, sure. But the website regexr does this aswell, and is not built in PHP. So I don't think it is a PHP issue. I have added the code that does this to the question Commented Sep 15, 2021 at 14:45
  • Yes, you're right, but I have to ask (see the answer I gave here). Your preg_replace() probably doesn't do what you think it does, it takes the first [quote] and the first [/quote], not the outer ones. In this case using regular expressions will probably not be the correct solution. Yes, they do have a place in finding things when making this parser, but without building a semi-real DOM, like HTML has, I don't think this will ever work. Commented Sep 15, 2021 at 14:57
  • Thinking about it, actually it doesn't matter. Simply do two separate replacements: 1. Replace [quote] by <div class='quote'><div class='quotetext'>. And 2. Replace [/quote] by </div></div>. As long as the BBCODE is valid this should work out fine. Commented Sep 15, 2021 at 15:02

1 Answer 1

2

A regex approach in one pass:

  1. construct an array which associates a bbcode tag with the corresponding html code.
  2. write a pattern able to match nested (or not) quote bbcode tags. The interest will be double, because it will allow to extract only valid parts (that are balanced), to then proceed to the replacement.
  3. proceed to a simple replacement with strtr inside a callback function using the associative array.

Pro: this is relatively fast since it needs only one pass and because of the use of strtr.
Cons: It isn't flexible because it will take in account only tags like [quote] and not [quote param="bidule"] or [QUOTE]. (however nothing forbids to write a more elaborated callback function and to change the pattern a little).

$corr = [
    '[quote]' => '<div class="quote"><div class="quotetext">',
    '[/quote]' => '</div></div>'
];

$pat = '~ \[quote]
          # all that is not a quote tag
          (?<content> [^[]*+ (?: \[ (?! /?quote] ) [^[]* )*+ )
          # an eventual recursion ( (?R) is a reference to the whole pattern)
          (?: (?R) (?&content) )*+
          \[/quote]
        ~x';

$result = preg_replace_callback($pat, fn($m) => strtr($m[0], $corr), $str);

A more classical approach with several passes:

  1. Build a pattern that forbids nested quote tags, this way, only inner tags are replaced.
  2. put the replacement in a while loop and stop it when there's no more tags to replace (use the preg_replace count parameter to know that)
$pat = '~ \[quote] ( [^[]*+ (?: \[ (?! /? quote] ) [^[]* )*+ ) \[/quote] ~x';
$repl = '<div class="quote"><div class="quotetext">$1</div></div>';

$result = $str;
$count = 0;

do {
    $result = preg_replace($pat, $repl, $result, -1, $count);
} while($count); 

pro: more flexible than the first approach since you can easily change the pattern and the replacement string.
cons: clearly slower since you need n+1 loops where n is the max nesting level.


As an aside: for what reason you want to replace a poor [quote] tag with two divs when you need only one html tag and when the blockquote tag exists!

Sign up to request clarification or add additional context in comments.

3 Comments

You, are a lifesaver! We had one more problem, but that was caused by the "ignore whitespace" filter
@Timberman: you can include a literal whitespace in a pattern with the ignore whitespace/comment/verbose modifier x in three ways: 1. escape it with a backslash \ , 2. put it inside a character class [ ], 3. put it inside a quoted part: \Q \E. It's also possible to switch off this modifier inside a group like that: (?-x:.....)
I figured! Thanks! @Casimir Et Hippolyte

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.