The below mentioned RegEx perform very poorly on a very large string or more than 2000 Lines. Basically the Java String is composed of PL/SQL script.
1- Replace each occurrence of delimiting character, for example ||, != or > sign with a space before and after the characters. This takes infinite time and never ends, so no time can be recorded.
// Delimiting characters for SQLPlus
private static final String[] delimiters = { "\\|\\|", "=>", ":=", "!=", "<>", "<", ">", "\\(", "\\)", "!", ",", "\\+", "-", "=", "\\*", "\\|" };
for (int i = 0; i < delimiters.length; i++) {
script = script.replaceAll(delimiters[i], " " + delimiters[i] + " ");
}
2- The following pattern looks for all occurances of forward slash / except the ones that are preceded by a *. That mean don't look for forward slash in a block comment syntax. This takes about 103 Seconds for a 2000 lines of String.
Pattern p = Pattern.compile("([^\\*])([\\/])([^\\*])");
Matcher m = p.matcher(script);
while (m.find()) {
script = script.replaceAll(m.group(2), " " + m.group(2) + " ");
}
3- Remove any white spaces from within date or date format
Pattern p = Pattern.compile("(?i)(\\w{1,2}) +/ +(\\w{1,2}) +/ +(\\w{2,4})");
// Create a matcher with an input string
Matcher m = p.matcher(script);
while (m.find()) {
part1 = script.substring(0, m.start());
part2 = script.substring(m.end());
script = part1 + m.group().replaceAll("[ \t]+", "") + part2;
m = p.matcher(script);
}
Is there any way to optimize all the three RegEx so that they take less time?
Thanks
Ali