Count unique chars and validate String in some cases using Java Stream

Question

I'm trying to write a method that will validate String. If string has same amount of every char like "aabb", "abcabc", "abc" it is valid or if contains one extra symbol like "ababa" or "aab" it is also valid other cases - invalid. Update: sorry, I forget to mention such cases like abcabcab -> a-3, b-3, c-2 -> 2 extra symbols (a, b) -> invalid. And my code doesn't cover such cases. Space is a symbol, caps letters are different from small letters. Now I have this, but it looks ambiguous (especially last two methods):

public boolean validate(String line) {
    List<Long> keys = countMatches(countChars(line));
    int matchNum = keys.size();
    if (matchNum < 2) return true;
    return matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1;
}

Counting unique symbols entry I'd wish to get List<long>, but I don't know how:

private Map<Character, Long> countChars(String line) { 
    return line.chars()
               .mapToObj(c -> (char) c)
               .collect(groupingBy(Function.identity(), HashMap::new, counting()));
}


private List<Long> countMatches(Map<Character, Long> countedEntries) {
    return new ArrayList<>(countedEntries.values()
            .stream()
            .collect(groupingBy(Function.identity(), HashMap::new, counting()))
            .keySet());
}

How can I optimize a method above? I need just List<Long>, but have to create a map.

char is a sixteen bit integral type in Java. If you know there will be some huge number of characters in the input string (or you know it's limited to US-ASCII) you could use a long[256] or long[65536] array and index by the character. Then you could return a long[] instead of a List<Long>. Are you familiar with the expression "premature optimization is the root of all evil?" — Elliott Frisch
– Elliott Frisch, Commented Apr 17, 2020 at 0:05
@ElliottFrisch It's really not a "premature optimization" if you notice that the method countMatches eventually tries to find out the distinct values from the supplied Map by creating a map and that too counting the occurrences and then discarding the values of the newly created map altogether. Which s just a matter of new HashSet<>(map.values()), but then the output desired is List and that's mostly since the OP is trying to access indexes to validate the values in the later phase. — Naman
– Naman, Commented Apr 17, 2020 at 3:47
I think using a long[] is going to perform better than a Set (hence my premature optimization comment). If nothing else, it avoids the boxing of the counts to Long (which is not a trivial cost here - for long enough strings). — Elliott Frisch
– Elliott Frisch, Commented Apr 17, 2020 at 3:53
@ElliottFrisch since there can’t be more characters than the String length, which is an int, there is no need to use long for the counts. — Holger
– Holger, Commented Apr 17, 2020 at 17:23
the code from OP does not work for abcabcab as it only checks if the difference between the char-lengths is 1 matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1. as you said 3a 3b 2c. in this case the char-length diff is one (3-2), but twice, so it returns valid instead of invalid — pero_hero
– pero_hero, Commented Apr 17, 2020 at 19:42

Naman · Accepted Answer · 2020-04-17 03:41:39Z

3

As I could observe, you are looking for distinct frequencies using those two methods. You can merge that into one method to use a single stream pipeline as below :

private List<Long> distinctFrequencies(String line) {
    return line.chars().mapToObj(c -> (char) c)
            .collect(Collectors.groupingBy(Function.identity(),
                    Collectors.counting()))
            .values().stream()
            .distinct()
            .collect(Collectors.toList());
}

Of course, all you need to change in your validate method now is the assignment

List<Long> keys = distinctFrequencies(line);

With some more thought around it, if you wish to re-use the API Map<Character, Long> countChars somewhere else as well, you could have modified the distinct frequencies API to use it as

private List<Long> distinctFrequencies(String line) {
    return countChars(line).values()
            .stream()
            .distinct()
            .collect(Collectors.toList());
}

answered Apr 17, 2020 at 3:41

Naman

32.7k32 gold badges240 silver badges385 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pero_hero Over a year ago

using distinct frequencies will not work for a string like abbcc where you have an extra b and an extra c which makes the string invalid, but you would have just two distinct frequencies which just differ in a single character.

Naman Over a year ago

@pero_hero Alright and since we are talking about cases not covered in the original question, what do you expect the output for a String abbccdd and aabbbcccddd? By the way, even with the code in question the string abbcc is valid, isn't it? It would be worth an edit in the question to ask for that requirement since the initial view was to optimize or improve the design and not functionally fix things which I'd assumed to be working fine.

pero_hero · Accepted Answer · 2020-04-18 08:00:56Z

2

you could perform an evaluation if every char in a string has the same occurence count using the stream api like this:

boolean valid = "aabbccded".chars()
      .boxed()  
      .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))                      
      .values().stream()
      .reduce((a, b) -> a == b ? a : -1L)
      .map(v -> v > 0)
      .get();

EDIT:

after reading the comments, I now believe to have understood the requirement.

a string is considered valid if all chars in it have the same occurrence count like aabb
or if there is a single extra character like abb
the string abcabcab is invalid as it has 3a 3b and 2c and thus, it has 1 extra a and 1 extra b, that is too much. hence, you can't perform the validation with a frequency list, you need additional information about how often the char lengths differ -> Map

here is a new trial:

TreeMap<Long, Long> map = "abcabcab".chars()
                .boxed()
                .collect(groupingBy(Function.identity(), counting()))
                .values().stream()
                .collect(groupingBy(Function.identity(), TreeMap::new, counting()));

boolean valid = map.size() == 1 ||        // there is only a single char length
        ( map.size() == 2 &&              // there are two and there is only 1 extra char
        ((map.lastKey() - map.firstKey()) * map.lastEntry().getValue() <= 1));

the whole validation could be executed in a single statement by using the Collectors.collectingAndThen method that @Nikolas used in his answer or you could use a reduction as well:

boolean valid = "aabcc".chars()
    .boxed()
    .collect(groupingBy(Function.identity(), counting()))
    .values().stream()
    .collect(groupingBy(Function.identity(), TreeMap::new, counting()))
    .entrySet().stream()
    .reduce((min, high) -> {
         min.setValue((min.getKey() - high.getKey()) * high.getValue()); // min.getKey is the min char length
         return min;                                                     // high.getKey is a higher char length
                                                                         // high.getValue is occurrence count of higher char length
        })                                                               // this is always negative
    .map(min -> min.getValue() >= -1)
    .get();

edited Apr 18, 2020 at 8:00

answered Apr 17, 2020 at 8:32

pero_hero

3,2844 gold badges14 silver badges29 bronze badges

4 Comments

Nikolas Over a year ago

You can replace the whole mapping function using ternary operator with plain .map(v -> v > 0)

Hadi Over a year ago

Does not cover all cases such aabb

pero_hero Over a year ago

@HadiJ I believe both solutions work with aabb and with _aabb_ where _ denotes a space as they return true

Hadi Over a year ago

may work for that but I think it does not cover all cases. aabbeee , aabbccdeeed

Nikolas · Accepted Answer · 2020-04-17 12:44:07Z

1

Use Collector.collectingAndThen that is a collector that uses a downstream Collector and finisher Function that maps the result.

Use the Collectors.groupingBy and Collectors.counting to get the frequency of each character in the String.

// Results in Map<Integer, Long>
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())

Use the map -> new HashSet<>(map.values()).size() == 1 that checks whether all frequencies are equal - if so, there is one distinct value.

Wrapping these two in Collector.collectingAndThen looks like:

String line = "aabbccdeed";
boolean isValid = line.chars()                          // IntStream of characters    
    .boxed()                                            // boxed as Stream<Integer>
    .collect(Collectors.collectingAndThen(              // finisher's result type
        Collectors.groupingBy(                          // grouped Map<Integer, Integer>
                Function.identity(),                    // ... of each character
                Collectors.counting()),                 // ... frequency
        map -> new HashSet<>(map.values()).size() == 1  // checks the frequencies
    ));

// aabbccded  -> false
// aabbccdeed -> true

edited Apr 17, 2020 at 12:44

answered Apr 17, 2020 at 12:21

Nikolas

45.1k19 gold badges132 silver badges201 bronze badges

2 Comments

Naman Over a year ago

Notice the condition matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1 iin OP's code.

Hadi Over a year ago

Does not cover all cases such aabbccdeeed

Hadi · Accepted Answer · 2020-04-17 17:40:03Z

1

You can do like this:

first count every character occurrence.
then find min value for occurrence.

and at the last step sum all values that the difference with the smallest value(minValue) is less than or equal to one.

public static boolean validate(String line) {
    Map<Character, Long> map = line.chars()
                 .mapToObj(c -> (char) c)
                 .collect(groupingBy(Function.identity(), Collectors.counting()));
    long minValue = map.values().stream().min(Long::compareTo).orElse(0l);
    return map.values().stream().mapToLong(a -> Math.abs(a - minValue)).sum() <= 1;
}

edited Apr 17, 2020 at 17:40

answered Apr 17, 2020 at 16:37

Hadi

17.3k4 gold badges41 silver badges63 bronze badges

Collectives™ on Stack Overflow

Count unique chars and validate String in some cases using Java Stream

4 Answers 4

2 Comments

4 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

4 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related