2

I have this function to parse bbcode -> html:

$this->text = preg_replace(array(
    '/\[b\](.*?)\[\/b\]/ms', 
    '/\[i\](.*?)\[\/i\]/ms',
    '/\[u\](.*?)\[\/u\]/ms',
    '/\[img\](.*?)\[\/img\]/ms',
    '/\[email\](.*?)\[\/email\]/ms',
    '/\[url\="?(.*?)"?\](.*?)\[\/url\]/ms',
    '/\[size\="?(.*?)"?\](.*?)\[\/size\]/ms',
    '/\[youtube\](.*?)\[\/youtube\]/ms',
    '/\[color\="?(.*?)"?\](.*?)\[\/color\]/ms',    
    '/\[quote](.*?)\[\/quote\]/ms',
    '/\[list\=(.*?)\](.*?)\[\/list\]/ms',
    '/\[list\](.*?)\[\/list\]/ms',
    '/\[\*\]\s?(.*?)\n/ms'
   ),array(
    '<strong>\1</strong>',
    '<em>\1</em>',
    '<u>\1</u>',
    '<img src="\1" alt="\1" />',
    '<a href="mailto:\1">\1</a>',
    '<a href="\1">\2</a>',
    '<span style="font-size:\1%">\2</span>',
    '<object width="450" height="350"><param name="movie" value="\1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="\1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="450" height="350"></embed></object>',
    '<span style="color:\1">\2</span>',
    '<blockquote>\1</blockquote>',
    '<ol start="\1">\2</ol>',
    '<ul>\1</ul>',
    '<li>\1</li>'
   ),
   $original
);

Problem is, how to unparse this, like html -> bbcode?

1
  • The m pattern modifier has no affect if there are no start or end of string modifiers (^, $). Some pattern consolidation can be enjoyed using backreferences in similar patterns. Commented Jan 24, 2024 at 8:57

3 Answers 3

7

Don't.

Instead, store both the original unparsed text and the processed parsed text. Yes, this doubles the storage requirement, but it also makes it blindingly easy to:

  1. Allow user edits of the original without parsing the BBCode back out
  2. Allow quotes of other user posts, again without parsing
  3. Change the HTML each BBCode generates (just re-parse every post)
  4. Switch BBCode engines down the line (again, just re-parse every post)
Sign up to request clarification or add additional context in comments.

2 Comments

+1 If indeed the data has been available in BBCode format this would be far preferable.
Thanks, i think that's a better approach than what i originally thought.
5

It's pretty safe to say it's nigh impossible to build a reliable way to convert html to bbcode with just a slew of regexes. Use a parser (DOMDocument for instance), remove invalid elements & attributes with xpath's & inspection and then recursively walk it creating a bbcode string on the way (or just ignore invalid tags / attributes on the way).

3 Comments

Erm, care to clarify? "Tony the pony will not come." is a bit to cryptic for my liking ;)
Since BBCODE has a recursive structure just like HTML, this applies: stackoverflow.com/a/1732454/193892
And at the bottom of the answer you will find tony the pony aka cthulhu.
3

If you know exactly that the HTML code you want to de-bbcode was en-bbcoded using your method, than do the following:

Switch the two array you pass to preg_replace.

In the array with the HTML code, do the following for every element: Prepend # to the string. Append #s. Replace \1 (and \2 aso) with (.*?).

For the array with the bbcodes do thefollowing with every element: Remove / at the beginning and /ms at end. Replace \s with . Remove all \. Remove all ?. Replace the first (.*) in the string with $1 and the second with $2.

This should do. If any problems: Ask ;)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.