1

As a part of my implementation I need to implement iterating over chars as efficient as possible. Here is a part of my source code that I wrote:

public int normalize(char s[], int len) {
    for (int i = 0; i < len; i++) {
          switch (s[i]) {
            //numbers
            case EN_D0:
            case AR_D0:
              s[i]= FA_D0;
              break;
            case EN_D1:
            case AR_D1:
              s[i]= FA_D1;
              break;
            case EN_D2:
            case AR_D2:
              s[i]= FA_D2;
              break;
            case EN_D3:
            case AR_D3:
              s[i]= FA_D3;
              break;
            case EN_D4:
            case AR_D4:
              s[i]= FA_D4;
              break;
            case EN_D5:
            case AR_D5:
              s[i]= FA_D5;
              break;
            case EN_D6:
            case AR_D6:
              s[i]= FA_D6;
              break;
            case EN_D7:
            case AR_D7:
              s[i]= FA_D7;
              break;
            case EN_D8:
            case AR_D8:
              s[i]= FA_D8;
              break;
            case EN_D9:
            case AR_D9:
              s[i]= FA_D9;
              break;   
            //Symboles
            case EN_QUESTION_MARK:
              s[i]=FA_QUESTION_MARK;
              break;
            case EN_PERCENT_SIGN:
              s[i]=FA_PERCENT_SIGN;
              break;
            case EN_DASH1:
            case EN_DASH2:
            case EN_DASH3:
            case EN_DASH4:
              s[i]=FA_DASH;
              break;
            case HAMZA_ABOVE:
              len = delete(s, i, len);
              i--;
              break;
            default:
              break;
           }
        }
return len;

What is the most efficient way of doing such process? Please consider that I did not put all the conditions here because of it was around 600 different conditions. In addition to consider that this part of code should be run for huge documents that have tremendous amount of chars. So the efficiency really matters.

1
  • 1
    You'll have a lot of conditional checks but that seems unavoidable. I don't think you'll do better than you have. Commented Apr 11, 2015 at 6:31

1 Answer 1

6

If all the constants in your case statements and assignments are chars, you can use an array to map source char to target char. The length of the array would be 2^16.

char[] map = new char[65536];

...
map[AR_D7] = FA_D7;
...
map[AR_D9] = FA_D9;
...

Then you loop becomes :

for (int i = 0; i < len; i++)
    s[i] = map[s[i]];
Sign up to request clarification or add additional context in comments.

13 Comments

Does it perform more efficient than switch case?
@Alin make sure you're not falling victim to premature optimization!
@Alin I believe it should, since obtaining an element from an array should be faster than a switch statement with many conditions. It also makes the code more concise, which I think is even more important (since the performance gain will not necessarily be substantial).
@Alin Java is sometimes pretty smart in optimizing switch statements and if you're lucky, it might even end up with such an array. As using the array is much clearer and surely at least as fast as the switch, I'd always go for it. +++ One problem with such a big switch statement is that it's lot of code and there might be hidden a surprise somewhere, e.g., case AR_D8: s[i++]= FA_D8;.
@Eran what if char codes point to two different chars? For example '\ufefc' point us to two chars and should replace with the concatenation of '\u0644' and '\u0627'.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.