1

I try to make bbcode-ish engine for me website. But the thing is, it is not clear which codes are available, because the codes are made by the users. And on top of that, the whole thing has to be recursive.

For example:

Hello my name is [name user-id="1"]
I [bold]really[/bold] like cheeseburgers

These are the easy ones and i achieved making it work.

Now the problem is, what happens, when two of those codes are behind each other:

I [bold]really[/bold] like [bold]cheeseburgers[/bold]

Or inside each other

I [bold]really like [italic]cheeseburgers[/italic][/bold]

These codes can also have attributes

I [bold strengh="600"]really like [text font-size="24px"]cheeseburgers[/text][bold]

The following one worked quite well, but lacks in the recursive part (?R)

(?P<code>\[(?P<code_open>\w+)\s?(?P<attributes>[a-zA-Z-0-1-_=" .]*?)](?:(?P<content>.*?)\[\/(?P<code_close>\w+)\])?)

I just dont know where to put the (?R) recursive tag.

Also the system has to know that in this string here

I [bold]really like [italic]cheeseburgers[/italic][/bold] and [bold]football[/bold]

are 2 "code-objects":

1. [bold]really like [italic]cheeseburgers[/italic][/bold]

and

2. [bold]football[/bold]

... and the content of the first one is

really like [italic]cheeseburgers[/italic]

which again has a code in it

[italic]cheeseburgers[/italic]

which content is

cheeseburgers

I searched the web for two days now and i cant figure it out.

I thought of something like this:

  1. Look for something like [**** attr="foo"] where the attributes are optional and store it in a capturing group
  2. Look up wether there is a closing tag somewhere (can be optional too)
  3. If a closing tag exists, everything between the two tags should be stored as a "content"-capturing group - which then has to go through the same procedure again.

I hope there are some regex specialist which are willing to help me. :(

Thank you!

EDIT

As this might be difficult to understand, here is an input and an expected output:

Input:

[heading icon="rocket"]I'm a cool heading[/heading][textrow][text]<p>Hi!</p>[/text][/textrow]

I'd like to have an array like

array[0][name] = heading
array[0][attributes][icon] = rocket
array[0][content] = I'm a cool heading
array[1][name] = textrow
array[1][content] = [text]<p>Hi!</p>[/text]
array[1][0][name] = text
array[1][0][content] = <p>Hi!</p>
7
  • 1
    I'd take a look at this thread; stackoverflow.com/questions/6773192/recursive-bbcode-parsing. Commented Dec 22, 2015 at 16:51
  • chris85, isn't this too simple? I just cant use a simple replace because in some codes I need to call classes which then has to do some database functions for example. I need all the data stored in an array. Commented Dec 22, 2015 at 16:54
  • anubhava, [heading icon="rocket"]I'm a cool heading[/heading][textrow][text]<p>Hi!</p>[/text][/textrow] - here i need an array that says ok we have two codes "heading" and "text" the first one has "I'm a cool heading" as content inside (plus an attribute "icon" which is "rocket"), the second has "[text]<p>Hi!</p>[/text]" inside - which AGAIN has a code inside "text" with the content "<p>Hi!</p>" - So there has to be an array-tree which "represents" the structure Commented Dec 22, 2015 at 16:56
  • I've added a concrete example of input and output in the EDIT part of the question Commented Dec 22, 2015 at 17:03
  • I don't know how to use the (?R), but I'm really curious on how... you can try something with this pattern: (?s)\[(?!\/)([^\s\]]+)[^]]*\](.*?)\[\/\1\] Commented Dec 22, 2015 at 17:33

1 Answer 1

2

Having written multiple BBCode parsing systems, I can suggest NOT using regexes only. Instead, you should actually parse the text.

How you do this is up to you, but as a general idea you would want to use something like strpos to locate the first [ in your string, then check what comes after it to see if it looks like a BBCode tag and process it if so. Then, search for [ again starting from where you ended up.

This has certain advantages, such as being able to examine each code and skip it if it's invalid, as well as enforcing proper tag closing order ([bold][italic]Nesting![/bold][/italic] should be considered invalid) and being able to provide meaningful error messages to the user if something is wrong (invalid parameter, perhaps) because the parser knows exactly what is going on, whereas a regex would output something unexpected and potentially harmful.

It might be more work (or less, depending on your skill with regex), but it's worth it.

Sign up to request clarification or add additional context in comments.

2 Comments

I already thought of that, but then i totally got stuck with regex and regex101.com to play around with. I really hoped to achieve a solution with just regex. But you might be right. Do you maybe have a suggestion on how to start implementing a "own parser"?
As a basic idea, you will want $input = "..."; $pointer = 0; $output = "";, then you can do something like while(is_int($bracket = strpos($input,'[',$pointer))) { $output .= substr($input,$pointer,$bracket-$pointer); /* do some regex to get the tag from substr($input,$bracket) and process stuff here - you will need to get the position of the ] in here */ $pointer = $closeBracketPosition; } - this is just a very basic idea but hopefully it helps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.