1

I'm in need of a regexp that will allow me to retrieve all the following data from a php source file :

MUI('Some text')
MUI("Some text", $lang)

The regexp is used to extract all the terms enclosed in MUI("...") in a php file in order to build a list of items to translate. I already have the regexp for the first case :

MUI("Some text") > $pattern = '/MUI\((["\'].*?)["\']\)/';

But I had to add a parameter which is a variable and since then my regexp wont find the second case

MUI("Some text", $lang).

Please be aware that the text to find may be enclosed by ' or ". Thanks in advance for your ideas.

8
  • What you are trying to do is not really possible in a robust manner. Reason is that regular expressions as a tool are not mighty enough for the task. You might be able to get a working solution, but actually that would only be usually working. It is easy to demonstrate situations where the solution will fail. For example something like that: MUI("Some ) text", $lang). What you actually need as a tool is not regular expressions, but a language parser that is able to understand the actual structure of a valid PHP file. Commented Nov 18, 2022 at 11:33
  • See MUI\((?|"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)') demo Commented Nov 18, 2022 at 11:38
  • @WiktorStribiżew Just curious, can't it be as simple as this? Commented Nov 18, 2022 at 12:45
  • @WiktorStribiżew : Seems to work in your demo. I'll try it live in a few and will keep you posted. Thx for your input. Commented Nov 18, 2022 at 12:47
  • @nice_dev It cannot. Commented Nov 18, 2022 at 12:51

1 Answer 1

1

You can use

(?s)MUI\((?|"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)')

See the regex demo.

Details:

  • (?s) - . now matches any chars including line break chars
  • MUI\( - MUI( string
  • (?|"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)') - Branch reset group matching either:
    • "([^"\\]*(?:\\.[^"\\]*)*)" - a double quote string literal (capturing what is in between quotes into Group 1)
    • | - or
    • '([^'\\]*(?:\\.[^'\\]*)*)' - a single quote string literal (capturing what is in between quotes into Group 1 - yes, still Group 1 since this is an alternative inside a branch reset group).
Sign up to request clarification or add additional context in comments.

2 Comments

thx for your input. I've tried it but PHP throws an exception : $pattern = '/(?s)MUI((?|"([^"\]*(?:\\.[^"\]*)*)"|\'([^\'\]*(?:\\.[^\'\]*)*)\')/'; preg_match_all($pattern,$Content,$matches); preg_match_all():Compilation failed : missing terminating ] at offset 60. Any ideas on the trouble, maybe PHP needs some kind of escaping ?
@BlackPage In PHP, it must be defined as $regex = '~MUI\((?|"([^"\\\\]*(?:\\\\.[^"\\\\]*)*)"|\'([^\'\\\\]*(?:\\\\.[^\'\\\\]*)*)\')~s';

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.