4

I'm trying to write a method that will validate String. If string has same amount of every char like "aabb", "abcabc", "abc" it is valid or if contains one extra symbol like "ababa" or "aab" it is also valid other cases - invalid. Update: sorry, I forget to mention such cases like abcabcab -> a-3, b-3, c-2 -> 2 extra symbols (a, b) -> invalid. And my code doesn't cover such cases. Space is a symbol, caps letters are different from small letters. Now I have this, but it looks ambiguous (especially last two methods):

public boolean validate(String line) {
    List<Long> keys = countMatches(countChars(line));
    int matchNum = keys.size();
    if (matchNum < 2) return true;
    return matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1;
}

Counting unique symbols entry I'd wish to get List<long>, but I don't know how:

private Map<Character, Long> countChars(String line) { 
    return line.chars()
               .mapToObj(c -> (char) c)
               .collect(groupingBy(Function.identity(), HashMap::new, counting()));
}


private List<Long> countMatches(Map<Character, Long> countedEntries) {
    return new ArrayList<>(countedEntries.values()
            .stream()
            .collect(groupingBy(Function.identity(), HashMap::new, counting()))
            .keySet());
}

How can I optimize a method above? I need just List<Long>, but have to create a map.

8
  • char is a sixteen bit integral type in Java. If you know there will be some huge number of characters in the input string (or you know it's limited to US-ASCII) you could use a long[256] or long[65536] array and index by the character. Then you could return a long[] instead of a List<Long>. Are you familiar with the expression "premature optimization is the root of all evil?" Commented Apr 17, 2020 at 0:05
  • @ElliottFrisch It's really not a "premature optimization" if you notice that the method countMatches eventually tries to find out the distinct values from the supplied Map by creating a map and that too counting the occurrences and then discarding the values of the newly created map altogether. Which s just a matter of new HashSet<>(map.values()), but then the output desired is List and that's mostly since the OP is trying to access indexes to validate the values in the later phase. Commented Apr 17, 2020 at 3:47
  • I think using a long[] is going to perform better than a Set (hence my premature optimization comment). If nothing else, it avoids the boxing of the counts to Long (which is not a trivial cost here - for long enough strings). Commented Apr 17, 2020 at 3:53
  • 1
    @ElliottFrisch since there can’t be more characters than the String length, which is an int, there is no need to use long for the counts. Commented Apr 17, 2020 at 17:23
  • 1
    the code from OP does not work for abcabcab as it only checks if the difference between the char-lengths is 1 matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1. as you said 3a 3b 2c. in this case the char-length diff is one (3-2), but twice, so it returns valid instead of invalid Commented Apr 17, 2020 at 19:42

4 Answers 4

3

As I could observe, you are looking for distinct frequencies using those two methods. You can merge that into one method to use a single stream pipeline as below :

private List<Long> distinctFrequencies(String line) {
    return line.chars().mapToObj(c -> (char) c)
            .collect(Collectors.groupingBy(Function.identity(),
                    Collectors.counting()))
            .values().stream()
            .distinct()
            .collect(Collectors.toList());
}

Of course, all you need to change in your validate method now is the assignment

List<Long> keys = distinctFrequencies(line);

With some more thought around it, if you wish to re-use the API Map<Character, Long> countChars somewhere else as well, you could have modified the distinct frequencies API to use it as

private List<Long> distinctFrequencies(String line) {
    return countChars(line).values()
            .stream()
            .distinct()
            .collect(Collectors.toList());
}
Sign up to request clarification or add additional context in comments.

2 Comments

using distinct frequencies will not work for a string like abbcc where you have an extra b and an extra c which makes the string invalid, but you would have just two distinct frequencies which just differ in a single character.
@pero_hero Alright and since we are talking about cases not covered in the original question, what do you expect the output for a String abbccdd and aabbbcccddd? By the way, even with the code in question the string abbcc is valid, isn't it? It would be worth an edit in the question to ask for that requirement since the initial view was to optimize or improve the design and not functionally fix things which I'd assumed to be working fine.
2

you could perform an evaluation if every char in a string has the same occurence count using the stream api like this:

boolean valid = "aabbccded".chars()
      .boxed()  
      .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))                      
      .values().stream()
      .reduce((a, b) -> a == b ? a : -1L)
      .map(v -> v > 0)
      .get();

EDIT:

after reading the comments, I now believe to have understood the requirement.

  1. a string is considered valid if all chars in it have the same occurrence count like aabb
  2. or if there is a single extra character like abb
  3. the string abcabcab is invalid as it has 3a 3b and 2c and thus, it has 1 extra a and 1 extra b, that is too much. hence, you can't perform the validation with a frequency list, you need additional information about how often the char lengths differ -> Map

here is a new trial:

TreeMap<Long, Long> map = "abcabcab".chars()
                .boxed()
                .collect(groupingBy(Function.identity(), counting()))
                .values().stream()
                .collect(groupingBy(Function.identity(), TreeMap::new, counting()));

boolean valid = map.size() == 1 ||        // there is only a single char length
        ( map.size() == 2 &&              // there are two and there is only 1 extra char
        ((map.lastKey() - map.firstKey()) * map.lastEntry().getValue() <= 1));

the whole validation could be executed in a single statement by using the Collectors.collectingAndThen method that @Nikolas used in his answer or you could use a reduction as well:

boolean valid = "aabcc".chars()
    .boxed()
    .collect(groupingBy(Function.identity(), counting()))
    .values().stream()
    .collect(groupingBy(Function.identity(), TreeMap::new, counting()))
    .entrySet().stream()
    .reduce((min, high) -> {
         min.setValue((min.getKey() - high.getKey()) * high.getValue()); // min.getKey is the min char length
         return min;                                                     // high.getKey is a higher char length
                                                                         // high.getValue is occurrence count of higher char length
        })                                                               // this is always negative
    .map(min -> min.getValue() >= -1)
    .get();

4 Comments

You can replace the whole mapping function using ternary operator with plain .map(v -> v > 0)
Does not cover all cases such aabb
@HadiJ I believe both solutions work with aabb and with _aabb_ where _ denotes a space as they return true
may work for that but I think it does not cover all cases. aabbeee , aabbccdeeed
1

Use Collector.collectingAndThen that is a collector that uses a downstream Collector and finisher Function that maps the result.

  • Use the Collectors.groupingBy and Collectors.counting to get the frequency of each character in the String.

    // Results in Map<Integer, Long>
    .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())
    
  • Use the map -> new HashSet<>(map.values()).size() == 1 that checks whether all frequencies are equal - if so, there is one distinct value.

Wrapping these two in Collector.collectingAndThen looks like:

String line = "aabbccdeed";
boolean isValid = line.chars()                          // IntStream of characters    
    .boxed()                                            // boxed as Stream<Integer>
    .collect(Collectors.collectingAndThen(              // finisher's result type
        Collectors.groupingBy(                          // grouped Map<Integer, Integer>
                Function.identity(),                    // ... of each character
                Collectors.counting()),                 // ... frequency
        map -> new HashSet<>(map.values()).size() == 1  // checks the frequencies
    ));

// aabbccded  -> false
// aabbccdeed -> true

2 Comments

Notice the condition matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1 iin OP's code.
Does not cover all cases such aabbccdeeed
1

You can do like this:

  1. first count every character occurrence.
  2. then find min value for occurrence.
  3. and at the last step sum all values that the difference with the smallest value(minValue) is less than or equal to one.

    public static boolean validate(String line) {
        Map<Character, Long> map = line.chars()
                     .mapToObj(c -> (char) c)
                     .collect(groupingBy(Function.identity(), Collectors.counting()));
        long minValue = map.values().stream().min(Long::compareTo).orElse(0l);
        return map.values().stream().mapToLong(a -> Math.abs(a - minValue)).sum() <= 1;
    }
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.