Skip to main content
Tweeted twitter.com/StackCodeReview/status/773625661256851456
deleted 20 characters in body
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

Recently, I was burdened with the task of finding a bug. It turns out the problem was strings from different systems containing different newlines. Comparing two strings with different newlines (but same "text") still are not equal. E.g. "new\nline" (Unix flavor) and "new\r\nline" (Windows flavor) are not equal.

Since the code will be dealing with both types of newlines, I wrote a method to test for equality independent from newline type (the code treats "\n", "\r", "\r\n" and "\n\r" the same (even though "\n\r" isn't really used as newline)).

After I got the code done I would like your opinion on it. What do you think of variable names or method names? (I know I could have chosen better names)? Is there a way to optimize the code or make it more readable?

Thanks in advance.

Recently I was burdened with the task of finding a bug. It turns out the problem was strings from different systems containing different newlines. Comparing two strings with different newlines (but same "text") still are not equal. E.g. "new\nline" (Unix flavor) and "new\r\nline" (Windows flavor) are not equal.

Since the code will be dealing with both types of newlines I wrote a method to test for equality independent from newline type (the code treats "\n", "\r", "\r\n" and "\n\r" the same (even though "\n\r" isn't really used as newline)).

After I got the code done I would like your opinion on it. What do you think of variable names or method names? (I know I could have chosen better names) Is there a way to optimize the code or make it more readable?

Thanks in advance.

Recently, I was burdened with the task of finding a bug. It turns out the problem was strings from different systems containing different newlines. Comparing two strings with different newlines (but same "text") still are not equal. E.g. "new\nline" (Unix flavor) and "new\r\nline" (Windows flavor) are not equal.

Since the code will be dealing with both types of newlines, I wrote a method to test for equality independent from newline type (the code treats "\n", "\r", "\r\n" and "\n\r" the same (even though "\n\r" isn't really used as newline)).

After I got the code done I would like your opinion on it. What do you think of variable names or method names (I know I could have chosen better names)? Is there a way to optimize the code or make it more readable?

Source Link
Sirac
  • 335
  • 1
  • 2
  • 8

Comparing strings with different newlines

Recently I was burdened with the task of finding a bug. It turns out the problem was strings from different systems containing different newlines. Comparing two strings with different newlines (but same "text") still are not equal. E.g. "new\nline" (Unix flavor) and "new\r\nline" (Windows flavor) are not equal.

Since the code will be dealing with both types of newlines I wrote a method to test for equality independent from newline type (the code treats "\n", "\r", "\r\n" and "\n\r" the same (even though "\n\r" isn't really used as newline)).

After I got the code done I would like your opinion on it. What do you think of variable names or method names? (I know I could have chosen better names) Is there a way to optimize the code or make it more readable?

Thanks in advance.

public class StringUtils {
    public static final char LF = '\n';
    public static final char CR = '\r';
    
    public static boolean equalsIgnoreNewlineTwirks(String str, String other){
        if (str == null || other == null){
            return false;
        }
        if (str == other){
            return true;
        }
        
        char[] s1 = str.toCharArray();
        char[] s2 = other.toCharArray();
        int index1 = 0, index2 = 0;
        while (true){
            boolean oob1 = index1 >= s1.length, oob2 = index2 >= s2.length;
            if (oob1 | oob2){
                return oob1 & oob2;
            }
            
            char ch1 = s1[index1], ch2 = s2[index2];
            if (ch1 != ch2){
                if (ch1 != LF && ch1 != CR) return false;
                if (ch2 != LF && ch2 != CR) return false;
                
                if (index1 + 1 < s1.length && isCRAndLF(s1[index1], s1[index1 + 1])){
                    index1++;
                }
                if (index2 + 1 < s2.length && isCRAndLF(s2[index2], s2[index2 + 1])){
                    index2++;
                }
            }
            
            index1++; index2++;
        }
    }
    
    private static boolean isCRAndLF(char ch1, char ch2){
        return (ch1 == CR && ch2 == LF) || (ch1 == LF && ch2 == CR);
    }
}