0

Is it possible to have a PHP regex expression that extracts the content from the first ] to the last [?

For example if I had the following string:

$string = [shortcode]You write a shortcode by using ([])[/shortcode]

I would want to extract:

You write a shortcode by using brackets ([])

and store it in a variable. The content to be extracted could be anything. Thanks in advance.

3
  • Thanks to all who have answered so far. Though I haven't tested all of your solutions yet, I think I was going about this the wrong way. Instead of using regex I used strpos() and strrpos(): $content = [shortcode]You write a shortcode by using ([])[/shortcode] $start = strpos($content, ']'); $start = $start + 1; $end = strrpos($content, '['); $dif = $end - $start; $content = substr($content, $start, $dif); echo $content; //output: You write a shortcode by using ([]) I think that will do the trick. Commented Apr 23, 2012 at 21:39
  • Looks like it would work to me... Commented Apr 23, 2012 at 21:49
  • @Sam, don't forget that if you don't accept answers it will hurt your SO score. Commented Apr 23, 2012 at 23:53

4 Answers 4

3

You should be using capturing groups to make sure you match the closing tag.

\[(\w+)\].*?\[/\1\]

This will match a word inside [] and keep going until if finds the same word inside [/...].

Sign up to request clarification or add additional context in comments.

8 Comments

While it might do what the asker wants, it might also not do it, since it does not do what the asker asked for.
Oh, and you lack a capturing group around the content he wants to extract
Oh, and it fails when nesting tags - lazy matching is NOT what you want here, even if it looks fancier.
@Jasper - It satisfies the OP's example perfectly, which also never mentioned nested tags. And your answer will fail for malformed "shortcodes", and this one won't. Instead of attacking the clearly better answer, why not suggest improvements? In fact, your only worthwhile comment was the second one, because I'm also pretty sure this answer wasn't trying to be fancy, but functional given the requirements set forth in the OP.
@nickb This may satisfy his example, but not his question. He is asking about matching the brackets. He never said malformed shortcodes shouldn't be matched, so that's not a problem in my regex, it's a problem in this one, as it does not match the question. Also, the example may not have contained a nested tag, but that doesn't mean that the asker doesn't ever care about them. This tries to answer what the asker might want, but in doing so is making unnecessary assumptions and not answering the question.
|
1

Regexes are greedy by default, so this will do the job just fine:

/\](.*)\[/

To get this working in PHP properly, you would do something like this:

preg_match('/\](.*)\[/', $text, $matches);

$result = $matches[1];

5 Comments

You could cut down on the extra \ s by using single quotes in the matching expression.
[b]this would [i]be[/i] wrong[/b] matching ]this would [i]be[
@Xeoncross It seems you have not tried it out. Try it, and see that you're mistaken. Also, if you want to know why, look up greediness, greediness in the context of regexes and how one can use ? in regexes to make * or + lazy.
@Jasper, I was mistaken in the exact match - but not the example of it being out-of-control when matching content. Try it: [b]this would [b]be[/b] wrong[/b] even [i]still matches ]this would [b]be[/b] wrong[/b] even [. You must specify more information in the regex.
@Xeoncross I don't think I understand what your problem with my regex is. Care to discuss it further in chat?
0

this could make, what you need

[^\]]\](.*)\[[^\[]

5 Comments

Why the non-bracket at the end and he beginning? They add absolutely nothing
First: automatically if you do a single match. Last: automatically because of greediness. Also, you have only a single of each of those, so it has nothing to do with being the first and the last (just about not having two of the same character after one another...)
Also, you could make it better by doing each of those two characters any number of time (*) and then using anchors at the start and beginning ($ and ^). That way, it does actually mean they are first and last (and compared to the automatic selection of both, it is optimized in the case that there are lot of closing square braces, but no opening ones).
ok, you are right, but I am not very experienced in php, I use regex in c#... don't know if there are differences (I believe not) between php and c# regexes, but it worked like this when we are parsing urls on / signs, so I write it as I know :) (and it works, even not optimized)
Yes, it works, more by accident, though than by skill. For example, it wouldn't work on ]a[. And the differences between regexes in the two languages don't even come up here.
0

This works:

preg_match( '@\](.*)\[@', $string, $matches);
print_r($matches);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.