If you're open to using a third-party library that works with Java 8 or above, Eclipse Collections (EC) can solve this problem using a primitive Bag to count characters. Use a CharBag if char values are required, or an IntBag if codePoints (int values) are required. A Bag is a simpler data structure for counting things and may be backed by a primitive HashMap so as not to box the counts as Integer or Long objects. A Bag doesn't suffer from the missing keys return null values problem that a HashMap does in Java.
@Test
public void characterCountJava8()
{
String word = "AAABBB";
CharAdapter chars = Strings.asChars(word);
CharBag charCounts = chars.toBag();
Assertions.assertEquals(3, charCounts.occurrencesOf('A'));
Assertions.assertEquals(3, charCounts.occurrencesOf('B'));
Assertions.assertEquals(0, charCounts.occurrencesOf('C'));
System.out.println(charCounts.toStringOfItemToCount());
}
Outputs:
{A=3, B=3}
CharAdapter and CharBag are primitive collection types available in EC. A CharBag is useful if you want to count char values. Notice that the charCounts.occurrencesOf('C') returns 0 instead of null as it would if this was a HashMap.
The following example shows using codePoints that are visually appealing using emojis. The code itself will work with Java 8, but I believe the Emoji literal support wasn't added until Java 11.
@Test
public void codePointCountJava11()
{
String emojis = "🍎🍎🍎🍌🍌";
CodePointAdapter codePoints = Strings.asCodePoints(emojis);
IntBag emojiCounts = codePoints.toBag();
int appleInt = "🍎".codePointAt(0);
int bananaInt = "🍌".codePointAt(0);
int pearInt = "🍐".codePointAt(0);
Assertions.assertEquals(3, emojiCounts.occurrencesOf(appleInt));
Assertions.assertEquals(2, emojiCounts.occurrencesOf(bananaInt));
Assertions.assertEquals(0, emojiCounts.occurrencesOf(pearInt));
System.out.println(emojiCounts.toStringOfItemToCount());
Bag<String> emojiStringCounts = emojiCounts.collect(Character::toString);
System.out.println(emojiStringCounts.toStringOfItemToCount());
}
Outputs:
{127820=2, 127822=3} // IntBag.toStringOfItemToCount()
{🍌=2, 🍎=3} // Bag<String>.toStringOfItemToCount()
CodePointAdapter and IntBag are primitive collection types available in EC. An IntBag is useful if you want to count int values. Notice that the emojiCounts.occurrencesOf(pearInt) returns 0 instead of null as it would if this was a HashMap.
I converted the IntBag to a Bag<String> to show the differences when printing int vs. char. You need to convert int codePoints back to String if you want to print anything.
The comment Holger left on the accepted answer about grapheme clusters was insightful and helpful. Thank you! The codepoint solution here suffers from the same issue as all of the other codepoint solutions.
Eclipse Collections 11.1 was compiled and released with Java 8. I wouldn't recommend staying on Java 8 any more, but wanted to point out this is still possible.
Note: I am a committer for Eclipse Collections.