1

I have the following scenario where I am supposed to use regex (Java/PCRE) on a line of code and strip off certain defined function and only strong the value of that function like in example below:

Input ArrayNew(1) = adjustalpha(shadowcolor, CInt(Math.Truncate (ObjectToNumber (Me.bezierviewshadow.getTag))))

Output : Replace Regex

ArrayNew(1) = adjustalpha(shadowcolor, Me.bezierviewshadow.getTag)

Here CInt, Math.Truncate, and ObjectToNumber is removed retaining on output as shown above

The functions CInt, Math.Truncate keep on changing to CStr or Math.Random etc etc so regex query can not be hardcoded.

I tried a lot of options on stackoverflow but most did not work.

Also it would be nice if the query is customizable like Cint returns everything function CInt refers to. ( find a text then everything between first ( and ) ignoring balanced parenthesis pairs in between.

1
  • It's probably easier to write yourself a little parser and count the parentheses here. Commented Nov 4, 2021 at 21:18

1 Answer 1

1

I know it's not pretty, but it's your fault to use raw regex for this :)

@Test
void unwrapCIntCall() {
    String input = "ArrayNew(1) = adjustalpha(shadowcolor, CInt(Math.Truncate (ObjectToNumber (Me.bezierviewshadow.getTag))))";
    String expectedOutput = "ArrayNew(1) = adjustalpha(shadowcolor, Me.bezierviewshadow.getTag)";

    String output = input.replaceAll("CInt\\s*\\(\\s*Math\\.Truncate\\s*\\(\\s*ObjectToNumber\\s*\\(\\s*(.*)\\s*\\)\\s*\\)\\s*\\)", "$1");
    assertEquals(expectedOutput, output);
}

Now some explanation; the \\s* parts allow any number of any whitespace character, where they are. In the pattern, I used (.*) in the middle, which means I match anything there, but it's fine*. I used (.*) instead of .* so that particular section gets captured as capturing group $1 (because $0 is always the whole match). The interesting part being captured, I can refer them in the replacement string.

*as long as you don't have multiple of such assignments within one string. Otherwise, you should break up the string into parts which contain only one such assignment and apply this replacement for each of those strings. Or, try (.*?) instead of (.*), it compiles for me - AFAIK that makes the .* match as few characters as possible.

If the methods actually being called vary, then replace their names in the regex with the variation you expect, like replace CInt with (?CInt|CStr), Math\\.Truncate with Math\\.(?Truncate|Random) etc. (Using (? instead of ( makes that group non-capturing, so they won't take up $1, $2, etc. slots).

If that gets too complicated, than you should really think whether you really want to do it with regex, or whether it'd be easier to just write a relatively longer function with plain string methods, like indexOf and substring :)

Bonus; if absolutely everything varies, but the call depth, then you might try this one:

String output = input.replaceAll("[\\w\\d.]+\\s*\\(\\s*[\\w\\d.]+\\s*\\(\\s*[\\w\\d.]+\\s*\\(\\s*(.*)\\s*\\)\\s*\\)\\s*\\)", "$1");

Yes, it's definitely a nightmare to read, but as far as I understand, you are after this monster :)

You can use ([^()]*) instead of (.*) to prevent deeper nested expressions. Note, that fine control of depth is a real weakness of everyday regular expressions.

Sign up to request clarification or add additional context in comments.

2 Comments

[\\w\\d.]+\\s*\(\\s*[\\w\\d.]+\\s*\(\\s*[\\w\\d.]+\\s*\(\\s*(.*)\\s*\)\\s*\)\\s*\) Which flavor is this? does not work either for Java or PCRE on test string
@mohsyn Simple Java regex, I did not even use any special feature. Make sure, that the IDE won't escape each \ character again when pasting, it should look in the code exactly as I've written. I've run it before posting, it works for me just fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.