0

I have this code to convert the whole text that is before "=" to uppercase

Matcher m = Pattern.compile("((?:^|\n).*?=)").matcher(conteudo);
while (m.find()) {
  conteudo = conteudo.replaceFirst(m.group(1), m.group(1).toUpperCase());
}

But when the string is too large, it becomes very slow, I want to find a faster way to do that.

Any sugestions?

EDIT

I haven't explained right. I have a text like this

field=value
field2=value2
field3=value3

And I want to convert each line like this

FIELD=value
FIELD2=value2
FIELD3=value3
2
  • Ignore everything everyone else said and use re2j. It uses linear-time automata-based engine, unlike the builtin regex library of Java (and pretty much every other programming language) which uses the horrendously inefficient backtracking engine, which in Java—as if that wasn't bad enough already—is implemented recursively, making its performance 100x worse due to the method call overhead, and another 100x times worse when running in debug mode; and due to the use of recursion, it suffers from StackOverflowException on certain regexes and inputs. Commented Feb 17 at 10:28
  • Fun fact, I posted the above comment as an answer twice, and it got deleted, twice. Apparently, an AI flagged my answer as being written by an AI, and then a non-AI (i.e. non-artificial, and non-intelligent) humans (4 of them!) deleted my answer. The "moderators" who deleted my answer are: Mark Rotteveel, David Maze and Mofi and Dalija Prasnikar♦. When I politely told them to go occupy themselves with something else than deleting valid answers, I got banned for 2 weeks 🤦 Commented Feb 26 at 19:48

4 Answers 4

2

The fastest way to get regex to work fast is to not use regex. Regex was never meant to be and almost never is a good choice for performance-sensitive operations. (Further reading: Why are regular expressions so controversial?)

Try using String class methods instead, or write a custom method doing what you want. Use a tokenizer with split on '=', and then use .toUpperCase() on the tailing part (what's after \n). Alternatively, just convert to char[] or use charAt() and traverse it manually, switching chars to upper after a newline and back to regular way after '='.

For example:

public static String changeCase( String s ) {
    boolean capitalize = true;
    int len = s.length();
    char[] output = new char[len];
    for( int i = 0; i < len; i++ ) {
      char input = s.charAt(i);
      if ( input == '\n' ) {
        capitalize = true;
        output[i] = input;
      } else if ( input == '=' ) {
        capitalize = false;
        output[i] = input;
      } else {
        output[i] = capitalize ? Character.toUpperCase(input) : input;
      }
    }
    return new String(output);
}

Method input:

field=value\n
field2=value2\n
field3=value3

Method output:

FIELD=value\n
FIELD2=value2\n
FIELD3=value3

Try it here: http://ideone.com/k0p67j

PS (by Jamie Zawinski):

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Sign up to request clarification or add additional context in comments.

6 Comments

It should be noted that in many high-level languages (i.e. without a compiler) regexes are faster than your method. Just try this exact method in Python, PHP, Ruby, Perl, etc. and it will be faster to use a properly written regex instead.
@Wolph while I agree with you in general (i.e. that using regex in interpreted languages is not necessarily slower than using hand-crafted string processing code), I'd still point out to two things: a) question was about Java performance explicitly, b) Regex was never meant to be and almost never is a good choice for performance-sensitive operations. Using interpreted languages is never a good choice for performance-sensitive operations, with the languages you mentioned having performance penalty of one-two orders of magnitude. As such, I consider my main point still valid.
nb I assume you don't mean "high-level languages" but "interpreted languages" per se - and, even then, the difference in favour of regex is from interpreting overhead, not from wrong algorithmic approach. In AOT-compiled languages, regex always incurs a performance penalty, the same way it's faster to have own, custom-tailored parser, than to use scanf with format string to parse the data, it's faster to use a native array for random-access data than to use a high-level data abstraction etc.
I meant high-level languages, the "without a compiler" part was meant as an example. I agree with you that regexes are not meant for performance btw, I'm just noting that it might still be the better option in some cases.
It is not always true that regular expression-based solutions are slower than hand-crafted string processing: Advanced regex engines may implement super-fast search algorithms like Boyer-Moore-Holbrooke which could accelerate the processing drastically. Take a look at github.com/aunkrig/lfr .
|
1

What about something like this? indexOf should be fast enough.

int equalsIdx = conteudo.indexOf('=');
String result = conteudo.substring(0, equalsIdx).toUpperCase() + conteudo.substring(equalsIdx, conteudo.length());

1 Comment

I've edited the question, please see again. I've explained wrong
1

With a multiline regex we can simply get every line separately and replace it :)

String conteudo = "field=value\nfield2=value2\nfield3=value3";
Pattern pattern = Pattern.compile("^([^=]+=)(.*)$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(conteudo);
StringBuffer result = new StringBuffer();

while (matcher.find()) {
    matcher.appendReplacement(result, matcher.group(1).toUpperCase() + matcher.group(2));
}
System.out.println(conteudo);
System.out.println(result.toString());

4 Comments

I was just going to write the same answer.
I've edited the question, please see again. I've explained wrong.
@LeonardoGaldioli: try the new version :)
In that case I would need different testcases. It works with the given examples.
0
((?:^|\n)[^=]*=)

Try this .

3 Comments

and how i convert to uppercase?
@LeonardoGaldioli Converting to upprcase ... there must be something str.to_upper or something in java.That cannot be doneby regex.
Well, it can be done by some regex flavours, just not by Java's one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.