0

I have the following two lists of String:

{APPLE, ORANGE, BANANA} //call it keyList
{APPLE123, ORANGEXXX, 1APPLE, APPLEEEE} //call it valueList

Desired output is an HashMap<String, List<String>> like this:

<APPLE, {APPLE123, 1APPLE, APPLEEEE}>
<ORANGE, {ORANGEXXX}>
<BANANA, {}> //also <key, null> is accepted

I have implemented this solution(it works)

HashMap<String, List<String>> myMap = new HashMap<>();
keyList.forEach(key -> {
    List<String> values = valueList.stream()
            .filter(value -> value.contains(key))
            .collect(Collectors.toList());
    myMap.put(key, values);
});

Given the assumption that a value is related to only one key (it's a constraint of my domain), is this the best solution in java8 , in terms of performance and/or code cleaning ? Can it be tuned in some way?

1 Answer 1

1

If you can safely assume that each value is associated with a key, and only one key, you can go into the following direction:

Pattern p = Pattern.compile(String.join("|", keyList));
Map<String, List<String>> map = valueList.stream()
    .collect(Collectors.groupingBy(s -> {
        Matcher m = p.matcher(s);
        if(!m.find()) throw new AssertionError();
        return m.group();
    }));

map.forEach((k,v) -> System.out.println(k+": "+v));

If the keys may contain special characters which could get misinterpreted as regex constructs, you can change the preparation code to

Pattern p = Pattern.compile(
    keyList.stream().map(Pattern::quote).collect(Collectors.joining("|")));

The collect operation does only create the groups for existing values. If you really need all keys to be present, you can use

Map<String, List<String>> map = valueList.stream()
    .collect(Collectors.groupingBy(s -> {
            Matcher m = p.matcher(s);
            if(!m.find()) throw new AssertionError();
            return m.group();
        },
        HashMap::new, // ensure mutable map
        Collectors.toList()
    ));
keyList.forEach(key -> map.putIfAbsent(key, Collections.emptyList()));

or

Pattern p = Pattern.compile(
    keyList.stream().map(Pattern::quote)
           .collect(Collectors.joining("|", ".*(", ").*")));
Map<String, List<String>> map = valueList.stream()
    .map(p::matcher)
    .filter(Matcher::matches)
    .collect(Collectors.groupingBy(m -> m.group(1),
        HashMap::new, // ensure mutable map
        Collectors.mapping(Matcher::group, Collectors.toList())
    ));
keyList.forEach(key -> map.putIfAbsent(key, Collections.emptyList()));
Sign up to request clarification or add additional context in comments.

2 Comments

thank you very much for your detailed examples. I will try the last two options. As far as I understand, your solutions avoid to loop the valueList for each key (maybe it does it under the wood, but in an efficient way).
Keep in mind that String.contains bears an internal loop too. So you basically have for each value(for each key(for each character(…))) in your original code, whereas only the innermost loop can stop at a match. So you have at least number of keys × number of values operations. My solution basically does for each value(for each character(for each key(…))) which can stop the “for each character” at the first match, which is more efficient when we know that there will always be a match. Further, as you suspected, the innermost loop as done by the regex engine is more efficient too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.