Convert BBCode-generated HTML back to BBCode

Question

I have this function to parse bbcode -> html:

$this->text = preg_replace(array(
    '/\[b\](.*?)\[\/b\]/ms', 
    '/\[i\](.*?)\[\/i\]/ms',
    '/\[u\](.*?)\[\/u\]/ms',
    '/\[img\](.*?)\[\/img\]/ms',
    '/\[email\](.*?)\[\/email\]/ms',
    '/\[url\="?(.*?)"?\](.*?)\[\/url\]/ms',
    '/\[size\="?(.*?)"?\](.*?)\[\/size\]/ms',
    '/\[youtube\](.*?)\[\/youtube\]/ms',
    '/\[color\="?(.*?)"?\](.*?)\[\/color\]/ms',    
    '/\[quote](.*?)\[\/quote\]/ms',
    '/\[list\=(.*?)\](.*?)\[\/list\]/ms',
    '/\[list\](.*?)\[\/list\]/ms',
    '/\[\*\]\s?(.*?)\n/ms'
   ),array(
    '<strong>\1</strong>',
    '<em>\1</em>',
    '<u>\1</u>',
    '<img src="\1" alt="\1" />',
    '<a href="mailto:\1">\1</a>',
    '<a href="\1">\2</a>',
    '<span style="font-size:\1%">\2</span>',
    '<object width="450" height="350"><param name="movie" value="\1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="\1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="450" height="350"></embed></object>',
    '<span style="color:\1">\2</span>',
    '<blockquote>\1</blockquote>',
    '<ol start="\1">\2</ol>',
    '<ul>\1</ul>',
    '<li>\1</li>'
   ),
   $original
);

Problem is, how to unparse this, like html -> bbcode?

The m pattern modifier has no affect if there are no start or end of string modifiers (^, $). Some pattern consolidation can be enjoyed using backreferences in similar patterns. — mickmackusa
– mickmackusa ♦, Commented Jan 24, 2024 at 8:57

Charles · Accepted Answer · 2010-07-17 17:18:09Z

7

Don't.

Instead, store both the original unparsed text and the processed parsed text. Yes, this doubles the storage requirement, but it also makes it blindingly easy to:

Allow user edits of the original without parsing the BBCode back out
Allow quotes of other user posts, again without parsing
Change the HTML each BBCode generates (just re-parse every post)
Switch BBCode engines down the line (again, just re-parse every post)

answered Jul 17, 2010 at 17:18

Charles

51.5k13 gold badges107 silver badges146 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Wrikken Over a year ago

+1 If indeed the data has been available in BBCode format this would be far preferable.

Rod Over a year ago

Thanks, i think that's a better approach than what i originally thought.

Wrikken · Accepted Answer · 2010-07-17 17:24:30Z

5

It's pretty safe to say it's nigh impossible to build a reliable way to convert html to bbcode with just a slew of regexes. Use a parser (DOMDocument for instance), remove invalid elements & attributes with xpath's & inspection and then recursively walk it creating a bbcode string on the way (or just ignore invalid tags / attributes on the way).

edited Jul 17, 2010 at 17:24

answered Jul 17, 2010 at 17:14

Wrikken

70.8k8 gold badges99 silver badges136 bronze badges

3 Comments

Wrikken Over a year ago

Erm, care to clarify? "Tony the pony will not come." is a bit to cryptic for my liking ;)

Prof. Falken Over a year ago

Since BBCODE has a recursive structure just like HTML, this applies: stackoverflow.com/a/1732454/193892

Prof. Falken Over a year ago

And at the bottom of the answer you will find tony the pony aka cthulhu.

NikiC · Accepted Answer · 2010-07-17 17:24:11Z

3

If you know exactly that the HTML code you want to de-bbcode was en-bbcoded using your method, than do the following:

Switch the two array you pass to preg_replace.

In the array with the HTML code, do the following for every element: Prepend # to the string. Append #s. Replace \1 (and \2 aso) with (.*?).

For the array with the bbcodes do thefollowing with every element: Remove / at the beginning and /ms at end. Replace \s with . Remove all \. Remove all ?. Replace the first (.*) in the string with $1 and the second with $2.

This should do. If any problems: Ask ;)

answered Jul 17, 2010 at 17:24

NikiC

102k39 gold badges194 silver badges226 bronze badges

Collectives™ on Stack Overflow

Convert BBCode-generated HTML back to BBCode

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related