2

Given a list in which each entry is a object that looks like

class Entry {
    public String id;
    public Object value;
}

Multiple entries could have the same id. I need a map where I can access all values that belong to a certain id:

Map<String, List<Object>> map;

My algorithm to achieve this:

for (Entry entry : listOfEntries) {
    List<Object> listOfValues;
    if (map.contains(entry.id)) {
        listOfValues = map.get(entry.id);
    } else {
        listOfValues = new List<Object>();
        map.put(entry.id, listOfValues);
    }
    listOfValues.add(entry.value);
}

Simply: I transform a list that looks like

ID | VALUE
---+------------
a  | foo
a  | bar
b  | foobar

To a map that looks like

a--+- foo 
   '- bar
b---- foobar

As you can see, contains is called for each entry of the source list. That's why I wonder if I could improve my algorithm, if I pre-sort the source list and then do this:

List<Object> listOfValues = new List<Object>();
String prevId = null;
for (Entry entry : listOfEntries) {
    if (prevId != null && prevId != entry.id) {
        map.put(prevId, listOfValues);
        listOfValues = new List<Object>();
    }
    listOfValues.add(entry.value);
    prevId = entry.id;
}
if (prevId != null) map.put(prevId, listOfValues);

The second solution has the advantage that I don't need to call map.contains() for every entry but the disadvantage that I have to sort before. Futhermore the first algorithm is easier to implement and less error prone, since you have to add some code after the actual loop.

Therefore my question is: Which method has better performance?

The examples are written in Java pseudo code but the actual question applies to other programming languages as well.

2
  • 2
    Without actually answering your question, your data structure is call a multimap. You can get what you need with the help of Guava's TreeMultimap and/or with MultimapBuilder. Commented Jul 27, 2016 at 8:37
  • @Sorin's answer is largely correct. On the performance part, I have met similar problems myself. In my case (integer id; billions of entries; many duplicated ids), the second approach is significantly faster because sorting is cache efficient and associated with a tiny constant. In your case, however, sorting strings offsets the cache efficiency of sort; large Object may also reduce sorting performance a bit. If in addition you don't have many duplicated ids, the first approach may be faster. I can't say for sure, though. Commented Jul 27, 2016 at 22:23

3 Answers 3

2

If you have a hash map and a very large amount of entries then inserting items one by one will be faster than sorting and inserting them list by list (O(n) vs O(N log N)). If you use a tree based map than the complexity is the same for both approaches.

However, I really doubt you have a sufficiently large amount of entries so memory access patterns, and how fast compare and hash functions are come into effect. You have 2 options: ignore it since the difference is not going to be significant or benchmark both options and see which one is working better on your system. If you don't have millions of entries I would ignore the issue and go with whatever is easier to understand.

Sign up to request clarification or add additional context in comments.

Comments

0

Don't presort. Even fast sorting algorithms like quicksort take, on average, O(n log n) for n items. Afterwards, you still need O(n) to walk the list. contains on a (hash) map takes constant time (checkout this question), don't worry about it. Walk the list in linear time and use contains.

4 Comments

"you still need O(n) to walk the list" Do you? If you presorted while adding, you could then use binary search, effectively reducing the linear probe to O(log n). In Java 8's HashMap, that's how Comparable values are stored when they fall into the same hash bucket.
@Slanec O(n*log(n))+O(log n) is still greater than O(n)+O(n).
@Slanec: I was referring to "Therefore my question is: Which method has better performance?" Since the OP is using a simple foreach loop in both cases, it is O(n). However, what do you mean by "[…] presorted while adding, you could then use binary search […]"? How would you use binary search when you have to look at each value?
@Vesper: It's O(n log n) + O(log n) = O(n log n) vs. O(n), you don't have to walk the list twice.
0

Would like to offer another solution using streams

import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.mapping;
import static java.util.stream.Collectors.toList;

Map<String, List<Object>> map = listOfValues.stream()
    .collect(groupingBy(entry -> entry.id, mapping(entry -> entry.value, toList())));

This code is more declarative - it only specifies that List should be transformed into Map. Then it is a library responsibility to actually perform transformation in efficient way.

1 Comment

While I prefer declarative code as well, I disagree with you on "Then it is a library responsibility to actually perform transformation in efficient way". Sure, it's up to the library implementers that a given operation performs well. But in case of Java's groupingBy "There are no guarantees on the type, mutability, serializability, or thread-safety of the Map returned". So if it comes to performance, I'd say this are properties you want to control and, therefore, you have to take care of.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.