Alternative to what follows
An alternative to everything that follows is removing any carriage return (\r) from the strings when you get them as input into the program.
Some reading on the iteration and carriage return
This question over at Stack Overflow will give you some insight into iteration over a string including speeds.
This one explains what's behind carriage return and why we have \n\r and just \n.
Onto your code (What was already mentioned)
As Roland Illig has already mentioned, in this answer, the function and parameter's names will be changed to match the following:
public static boolean equalsIgnoringNewlineStyle(String a, String b)
And the two constants are now private.
Do read CAD97's answer as it also gives you some new points of which only one I will mention: if both are null the result should always be true in conformance with java.util.Objects::equals. All his other points maintain relevance. Thus:
public static boolean equalsIgnoringNewlineStyle(String a, String b) {
if (a == b) {
return true;
}
if (a == null || b == null) {
return false;
}
// ...
toCharArray()
This function returns a new copy of the string, this will slow down your function, specially if you have intentions on calling your function often. Given this we'll be indexing the strings instead, indexing them only once for each character needed, storing them in a char.
Variable Declaration
This might be more of a style option but declaring your variables each on it's own line tends to look cleaner and the end result is the same, not affecting performance. Thus I'd change the indexes declaration to this:
int index_a = 0;
int index_b = 0;
while (true)
while (true) is considered bad practice, either use a while or do...while depending on your needs. In this case we'd stick to a while as our condition is right at the start of the loop, we should then move the condition to the loop:
while (index_a < a.length() && index_b < b.length()) {
// ...
}
return index_a == a.length() && index_b == b.length();
Notice the return this tells us right away that anything that can change the result of the function call to false will be inside the while loop (either through other conditions that will return false or just through changes to index_a and index_b.
The return result is whether or not we went through the whole of both strings, if not then they are of different lengths.
ch1 and ch2
Here I have renamed ch1 and ch2 to first and second respectively. The difference in names will allow for better spotting of which is which and avoids errors where one can write ch1 or ch2 when they meant the other; usually these kind of typos are hard to find too. Note the same could happen with my a and b but it is less likely.
char first = a.charAt(index_a);
char second = b.charAt(index_b);
Reduce Indentations
This is a pretty simple change that makes the code easier to read by reducing the amount of indentation, making the code more vertical, and reduces brace nesting (which increases reading complexity). So instead of this:
if (first != second){
// ...
}
we'll have this:
if (first == second) {
++index_a;
++index_b;
continue;
}
We have to increment the indices since we are not going to reach the end of the loop which is where we do it.
Merging ifs
You have two sequential conditions both of which have a common result: returning false. This:
if (ch1 != LF && ch1 != CR) return false;
if (ch2 != LF && ch2 != CR) return false;
becomes this:
if ((first != LF && first != CR) ||
(second != LF && second != CR)) {
// different characters and are not 'NL' nor 'CR'
return false;
}
Removing isCRAndLF
I removed isCRAndLF as it is a simple function that can be simplified on-site, plus we remove the function call (in the case the compiler does no inline it). Even if it did inline the function, it removes reading complexity.
Applying the changes to this:
if (index1 + 1 < s1.length && isCRAndLF(s1[index1], s1[index1 + 1])){
index1++;
}
if (index2 + 1 < s2.length && isCRAndLF(s2[index2], s2[index2 + 1])){
index2++;
}
we get this:
if (index_a + 1 < a.length()) {
char other = a.charAt(index_a + 1);
// 'first' here is either \n or \r (checked before)
// other != first ::= not { \n\n , \r\r }
if (other != first && (other == LF || other == CR)) {
++index_a;
}
}
if (index_b + 1 < b.length()) {
char other = b.charAt(index_b + 1);
// same as above, but for 'second'
if (other != second && (other == LF || other == CR)) {
++index_b;
}
}
Now explaining the condition further:
We've checked before in the function if first and second where either NL or CR and we only continued to this point if that was true, this means we have the following (same for second):
first ::= LF | CR
So now we check other != first this means:
first == LF && other != LF || first == CR && other != CR
This means they have to be different, so neither \n\n or \r\r happen. We then check:
other == LF || other == CR
This makes sure that this character is either LF or CR because we have not checked yet and it could just be any other character and nothing to do with what we want, this constrains the possible results to \n\r and \r\n by making sure other is the "opposite" of first.
Full code (untested)
public class StringUtils {
static final char LF = '\n';
static final char CR = '\r';
public static boolean equalsIgnoringNewlineStyle(String a, String b){
if (a == b) { // if both null return true
return true;
}
if (a == null || b == null) {
return false;
}
// toCharArray is slow (creates new copy of the whole string)
// we'll use indexing instead (it's faster)
// cleaner variable declaration (does not affect performance)
int index_a = 0;
int index_b = 0;
// while (true) are a bad practice, moved loop condition to right place
while (index_a < a.length() && index_b < b.length()) {
char first = a.charAt(index_a);
char second = b.charAt(index_b);
if (first == second) {
// decrease amount of identations
++index_a;
++index_b;
continue;
}
if ((first != LF && first != CR) ||
(second != LF && second != CR)) {
// at least one of the characters is not a new line
return false;
}
if (index_a + 1 < a.length()) {
char other = a.charAt(index_a + 1);
// 'first' here is either \n or \r (checked before)
// other != first ::= not { \n\n , \r\r }
if (other != first && (other == LF || other == CR)) {
++index_a;
}
}
if (index_b + 1 < b.length()) {
char other = b.charAt(index_b + 1);
// same as above, but for 'second'
if (other != second && (other == LF || other == CR)) {
++index_b;
}
}
++index_a;
++index_b;
}
return index_a == a.length() && index_b == b.length();
}
}
oob1? (it signals if index is at the end of string, but I'm curious how it relates to lettersoob) Edit: probably out-of-bytes. Right? \$\endgroup\$