1

I have String variable with value- f.e.: this is test-str-ing_łóśżćń.

And I would like replace this chars:

, -, ł,ó,ś,ż,ć,ń with those:

_,_,l,o,s,z,c,n.

And I mean here, that if parser will found f.e.: char - (which is second in first list) should be replaced with char that is in the same position/place in second list, which in this example is: _.

The char ó should be replaced with char o.

The char ń should be replaced with char n.

In my case the list of characters to replace is quite long and parsing in loop for each char to replace would not be enought efficient.

I know method replaceAll(). but it only accept one in String and one out String

So I am looking for method, that will allow me to work on arrays/list of Strings instead of single String.

Please give me some help.

6
  • I suggest you give a shot at apache commons lang & StringUtils commons.apache.org/proper/commons-lang/apidocs/org/apache/… Commented Sep 16, 2015 at 9:35
  • replaceAll is heavy weight (regex). fast is replace() with few variants. Commented Sep 16, 2015 at 9:36
  • professional seems implement "codepage" operations with CharsetProvider and family. I saw something for ancient polish pages 852, mazovia and converters. Commented Sep 16, 2015 at 9:39
  • 2
    Are you trying to do this? Commented Sep 16, 2015 at 9:40
  • Have a look at the question linked by dasblinkenlight. I have the strong feeling that that's what you're after. Add a second call to replaceAll and replace spaces, minuses etc. with an underscore. Commented Sep 16, 2015 at 9:47

3 Answers 3

4

Use java.text.Normalizer to Decompose accented letters in base letter plus "combining diacritical marks."

String base = Normalizer.normalize(accented, Form.NFKD)
    .replaceAll("\\p{M}", "");

This does a decompose (D) normalization, and then removes Marks.

Some replacements still needed.

Sign up to request clarification or add additional context in comments.

5 Comments

Ok. I will give it a try, cause seems to be most elegant proposition.
Tested. It doesn't convert ł to l, and still need to change space to _
I've added some extra replaceAll to achieve my goal: return Normalizer.normalize(stringToConvert,Form.NFKD).replaceAll("\\p{M}", "").replaceAll(" ", "_").replaceAll("ł", "l");. Now it works. Thanks.
It would suffice to do .replace("ł", "l") maybe even .replace(' ', '_') without the regex overhead.
The string replace is faster than regex replace and the char replace is faster than string replace since the new string length is known ahead of time.
1
    char[] out = new char[src.length()];
    for( j ...){
    inputChar = src.charAt(j);
    for (int i = 0; i < convertChars.length; i++) {
       if (inputChar == convertChars[i]) {
         inputChar = toChars[i];
       }
     }
    }
     out[j] = inputChar ;
   }
    out2 = new String(out);

Extracted from bigger code without IDE, not tested. Loop (I hope) don't allocate objects and should not degrade speed.

Comments

0

Make a static lookup table:

private static char[] substitutions = new char[65536];
static {
    // Initialize
    for (char c = 0; c < substitutions.length; c++) {
        substitutions[c] = c;
    }
    // Now add mappings.
    substitions['-'] = '_'; // Map source->target character
    ... // Add the rest
}
// LATER IN Code
char[] stringChars = inputString.toCharArray();
for (int i = 0; i < stringChars.length; i++) {
    stringChars[i] = substitutions[stringChars[i]];
}
outputString = new String(stringChars);

2 Comments

Probably on of the fastest algorithms (what with RAM? maybe isn't problem?) I had used this technique extensively in "C", DOS times, substitutions equivalent had 256B, highly optimizing C compiler does basic operation in 2-3 CPU cycles.
That array is 128kb which is nothing on today's machines.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.