1

I have a comparator that sorts an array of strings that contain letters and numbers, but can't seem to identify the regular expression that sorts them in the manner I am looking for.

I have used this question as a reference for my comparator.

array={string-a01,string-a20,string-a100,string-b01,string-b20,string-b100,string-c01,string-c20,string-c100 etc.}

Collections.sort(array, new Comparator<String>(){       
    public int compare(String o1, String o2) {
        return extractInt(o1) - extractInt(o2);
    }

    int extractInt(String s) {
        String num = s.replaceAll("\\D", "");
        return num.isEmpty() ? 0 : Integer.parseInt(num);
    }
});
        
for (String element : array) {
    System.out.println(element);
}

Before Introducing the comparator the output was:
string-a01, string-a100, string-a20, string-b01, string-b100, string-b20, string-c01, string-c20, string-c100

The output that this code produces is:
string-a01, string-b01, string-c01 string-a20, string-b20, string-c20 string-a100, string-b100, string-c100

The output I would like it to produce is:
string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100


EDIT: Edited for clarification. Array has been changed and output before the comparator was added.

3 Answers 3

3

Assuming that the string part is actually something else than just "string". You can extract the letter part of the ending, and the digit part, and compare those using a composite Comparator:

String[] array = { "string-a20", "string-a01", "string-b01",
    "string-b20", "string-c01", "string-c20",
    "string-a100", "string-b100", "string-c100" };

Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");

List<String> result = Arrays.stream(array)
    .map(p::matcher)
    .filter(Matcher::find)
    .sorted(Comparator.comparing((Matcher m) -> m.group(1)) // Compare the letter part
        .thenComparingInt(m -> Integer.parseInt(m.group(2)))) // Compare the number part
    .map(m -> m.group(0)) // Map back to String
    .collect(Collectors.toList());

System.out.println(result);

Output:

[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]

Legacy version (With the downside of having to recreate Matchers):

Arrays.sort(array, new Comparator<String>() {

    Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");

    @Override
    public int compare(String o1, String o2) {
        Matcher m1 = p.matcher(o1);
        Matcher m2 = p.matcher(o2);

        if(!(m1.find() && m2.find()))
            return 0; // Or throw a format exception

        int comparison = m1.group(1).compareTo(m2.group(1));
        return comparison != 0
            ? comparison 
            : Integer.compare(Integer.parseInt(m1.group(2)), Integer.parseInt(m2.group(2)));
    }

});
Sign up to request clarification or add additional context in comments.

7 Comments

I have made another update because the inital question didnt ask for everything i was looking for. The issue is that it orders like this: b01, b100, b11, c01, c100, c11... the array is generated dynamically and didnt realize the problem wouldnt occur with the original array posted
@Jon, Yeah I was looking into that, but it's a little more complicated.
@Jon [PLUS ONE] This must be the answer because of using lambdas perfectly to elaborate the answer in the most easiest way
@Jon That's unfortunate, I added a legacy version too.
@JornVernee Works like a charm! For anyone with the same issue dont forget the private final Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$"); Thanks!!
|
1

You are removing the alphabetical characters in your extractInt method, so you won't be able to use them in the comparison.

You should just sort them with no Comparator, which will sort them using the default, lexicographical sorting algorithm (java.lang.String implements Comparable<String>).

Example

// test array
String[] s = {"string-a01","string-a01","string-b01","string-b02","string-c02","string-c02"};

// sorting with null Comparator, will sort if the type implements Comparable - 
// which String does
Arrays.sort(s);

// printing in human-readable form
System.out.println(
    Arrays.toString(s)
);

Output

[string-a01, string-a01, string-b01, string-b02, string-c02, string-c02]

Notes

  • If you want to remove duplicates (which might be your intent from the question - not clear), add the array elements to a TreeSet instead:

    Set<String> deduplicated = new TreeSet<>(Arrays.asList(s));
    
  • If your sorting algorithm must act so that 2 comes before 12, then you need to extract the integer value without removing it from the elements, and compare it only when the rest of the Strings are equal.

15 Comments

Why don't you use lambda, easier
@GingerHead how is using the Java 8 stream API any "easier" than Arrays.sort, given this context?
For example Arrays.sort(s, (a, b) -> a.length() - b.length());
@GingerHead nope. It's sufficient to rely upon well-documented behaviour.
@Mena I have made another update because the inital question didnt ask for everything i was looking for. The issue is that it orders like this: b01, b100, b11, c01, c100, c11... the array is generated dynamically and didnt realize the problem wouldnt occur with the original array posted
|
1

It sounds like you want to order the strings on the "leading strings", i.e. everything up to the digits; if the leading strings are equal, then compare on the subsequent digits.

To split the string into its "string" and "integer" parts, you can first the "first trailing digit", i.e. the position of the first character in the string where there are no non-digits between it and the end of the string:

int firstTrailingDigit(String s) {
  int i = s.length();
  while (i > 0 && Character.isDigit(s.charAt(i - 1))) {
    --i;
  }
  return i;
}

You can then use this in your comparator:

public int compare(String a, String b) {
  int ftdA = firstTrailingDigit(a);
  int ftdB = firstTrailingDigit(b);

  // Get the leading strings, and compare.
  String sA = a.substring(0, ftdA);
  String sB = b.substring(0, ftdB);
  int compareStrings = sA.compareTo(sB);
  if (compareStrings != 0) {
    // If they're not equal, return the result of the comparison.
    return compareStrings;
  }

  // Get the trailing numbers from the strings, and compare.
  int iA = Integer.parseInt(a.substring(ftdA));
  int iB = Integer.parseInt(b.substring(ftdB));
  return Integer.compare(iA, iB);
}

Ideone demo

Input:

String[] array = {"string-a01","string-a20","string-a100","string-b01","string-b20","string-b100","string-c01","string-c20","string-c100"};

Output:

[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.