use java stream to find count of each character in a string

Question

we have this string: String input1 = "abbccd";

expected output: ab2c2d (note: if count=1, it shouldn't show in output).

the following code outputs a1,b2 c2 d2 on separate lines. Any suggestion to fix and improve?

input1.chars()
      .mapToObj(s -> Character.toLowerCase(Character.valueOf((char) s)))
      .collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
      .entrySet().stream()
      .forEach(n -> {System.out.println(n.getKey()+""+n.getValue());});

Federico klez Culloca · Accepted Answer · 2022-08-30 19:17:57Z

3

Make the last forEach a map instead.

Instead of n.getValue() only add that part if n.getValue is not 1.

Then collect by joining.

At that point you will have a string you can print.

So, assuming we don't want to change your first part:

"abbccd".chars()
        .mapToObj(s -> Character.toLowerCase((char)s)) // notice here Character.valueOf was redundant, we're already dealing with a char
        .collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
        .entrySet().stream()
        .map(n -> n.getKey()+""+(n.getValue() == 1 ? "" : n.getValue()))
        .collect(Collectors.joining());

Results in ab2c2d.

edited Aug 30, 2022 at 19:17

answered Aug 30, 2022 at 18:59

Federico klez Culloca

27.3k17 gold badges61 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Elliott Frisch Over a year ago

Might as well do .mapToObj(s -> Character.toLowerCase((char) s)) too. No need to take the character value of a character.

Federico klez Culloca Over a year ago

@ElliottFrisch yeah, I only focused on the last part. But I'll make that change as well, while I'm at it.

Elliott Frisch Over a year ago

Collectors.joining("") can just be Collectors.joining()

Basil Bourque Over a year ago

Your code breaks with most characters. See my Answer for an example of such failure, and for a solution. If you care to rework your Answer with code points, I'll gladly delete mine.

Federico klez Culloca Over a year ago

@BasilBourque right and noted. I won't change my answer because it still fits the given input. I upvoted yours, thought, in the hope it will surface to the top spot and be accepted.

Basil Bourque · Accepted Answer · 2022-08-31 02:02:20Z

Unfortunately, the other two Answers both fail with most characters.

Avoid legacy type `char`

The char type is legacy, essentially broken since Java 2, legacy since Java 5. As a 16-bit value, char is physically incapable of representing most of the 144,697 characters defined in Unicode.

See one Answer’s code break:

String input = "😷😷abbccd";
String output =
        input
                .chars()
                .mapToObj( s -> Character.toLowerCase( ( char ) s ) ) // notice here Character.valueOf was redundant, we're already dealing with a char
                .collect( Collectors.groupingBy( Function.identity() , LinkedHashMap :: new , Collectors.counting() ) )
                .entrySet().stream()
                .map( n -> n.getKey() + "" + ( n.getValue() == 1 ? "" : n.getValue() ) )
                .collect( Collectors.joining() );

System.out.println( "output = " + output );

output = ?2?2ab2c2d

Code point

Use code point integer numbers instead, when working with individual characters. A code point is the number permanently assigned to each character in Unicode. They range from zero to just over a million.

You will find code point related method scattered around the Java classes. These include String, StringBuilder, Character, etc.

The String#codePoints method returns an IntStream of code points, the code point number for each character in the string.

Here is a re-worked version of the clever code from Answer by Federico klez Culloca. Kudos to him, as I could not have come up with that approach.

String input = "😷😷abbccd";
String output =
        input
                .codePoints()
                .map( Character :: toLowerCase )
                .mapToObj( codePoint -> Character.toString( codePoint ) )
                .collect( Collectors.groupingBy( Function.identity() , LinkedHashMap :: new , Collectors.counting() ) )
                .entrySet().stream()
                .map( n -> n.getKey() + "" + ( n.getValue() == 1 ? "" : n.getValue() ) )
                .collect( Collectors.joining() );
System.out.println( "output = " + output );

output = 😷2ab2c2d

This solution still is incomplete. First, it still doesn’t handle all characters. A character can span multiple code points. E.g., try "🏳️‍🌈🏳️‍🌈". Then, mapping to lowercase is not the handling all characters for insensitive matching. In fact, it’s not even handling the (case sensitive) equality for all characters.

Collectives™ on Stack Overflow

use java stream to find count of each character in a string

2 Answers 2

5 Comments

Avoid legacy type `char`

Code point

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Avoid legacy type char

Code point

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

Avoid legacy type `char`