Performance of algorithms sorting list entries into a map

Question

Given a list in which each entry is a object that looks like

class Entry {
    public String id;
    public Object value;
}

Multiple entries could have the same id. I need a map where I can access all values that belong to a certain id:

Map<String, List<Object>> map;

My algorithm to achieve this:

for (Entry entry : listOfEntries) {
    List<Object> listOfValues;
    if (map.contains(entry.id)) {
        listOfValues = map.get(entry.id);
    } else {
        listOfValues = new List<Object>();
        map.put(entry.id, listOfValues);
    }
    listOfValues.add(entry.value);
}

Simply: I transform a list that looks like

ID | VALUE
---+------------
a  | foo
a  | bar
b  | foobar

To a map that looks like

a--+- foo 
   '- bar
b---- foobar

As you can see, contains is called for each entry of the source list. That's why I wonder if I could improve my algorithm, if I pre-sort the source list and then do this:

List<Object> listOfValues = new List<Object>();
String prevId = null;
for (Entry entry : listOfEntries) {
    if (prevId != null && prevId != entry.id) {
        map.put(prevId, listOfValues);
        listOfValues = new List<Object>();
    }
    listOfValues.add(entry.value);
    prevId = entry.id;
}
if (prevId != null) map.put(prevId, listOfValues);

The second solution has the advantage that I don't need to call map.contains() for every entry but the disadvantage that I have to sort before. Futhermore the first algorithm is easier to implement and less error prone, since you have to add some code after the actual loop.

Therefore my question is: Which method has better performance?

The examples are written in Java pseudo code but the actual question applies to other programming languages as well.

Without actually answering your question, your data structure is call a multimap. You can get what you need with the help of Guava's TreeMultimap and/or with MultimapBuilder. — Petr Janeček
– Petr Janeček, Commented Jul 27, 2016 at 8:37
@Sorin's answer is largely correct. On the performance part, I have met similar problems myself. In my case (integer id; billions of entries; many duplicated ids), the second approach is significantly faster because sorting is cache efficient and associated with a tiny constant. In your case, however, sorting strings offsets the cache efficiency of sort; large Object may also reduce sorting performance a bit. If in addition you don't have many duplicated ids, the first approach may be faster. I can't say for sure, though. — user172818
– user172818, Commented Jul 27, 2016 at 22:23

Sorin · Accepted Answer · 2016-07-27 13:07:03Z

2

If you have a hash map and a very large amount of entries then inserting items one by one will be faster than sorting and inserting them list by list (O(n) vs O(N log N)). If you use a tree based map than the complexity is the same for both approaches.

However, I really doubt you have a sufficiently large amount of entries so memory access patterns, and how fast compare and hash functions are come into effect. You have 2 options: ignore it since the difference is not going to be significant or benchmark both options and see which one is working better on your system. If you don't have millions of entries I would ignore the issue and go with whatever is easier to understand.

answered Jul 27, 2016 at 13:07

Sorin

12k25 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:33:07Z

0

Don't presort. Even fast sorting algorithms like quicksort take, on average, O(n log n) for n items. Afterwards, you still need O(n) to walk the list. contains on a (hash) map takes constant time (checkout this question), don't worry about it. Walk the list in linear time and use contains.

edited May 23, 2017 at 12:33

CommunityBot

11 silver badge

answered Jul 27, 2016 at 8:52

beatngu13

9,9938 gold badges43 silver badges75 bronze badges

4 Comments

Petr Janeček Over a year ago

"you still need O(n) to walk the list" Do you? If you presorted while adding, you could then use binary search, effectively reducing the linear probe to O(log n). In Java 8's HashMap, that's how Comparable values are stored when they fall into the same hash bucket.

Vesper Over a year ago

@Slanec O(n*log(n))+O(log n) is still greater than O(n)+O(n).

beatngu13 Over a year ago

@Slanec: I was referring to "Therefore my question is: Which method has better performance?" Since the OP is using a simple foreach loop in both cases, it is O(n). However, what do you mean by "[…] presorted while adding, you could then use binary search […]"? How would you use binary search when you have to look at each value?

beatngu13 Over a year ago

@Vesper: It's O(n log n) + O(log n) = O(n log n) vs. O(n), you don't have to walk the list twice.

Nazarii Bardiuk · Accepted Answer · 2016-07-27 13:38:50Z

0

Would like to offer another solution using streams

import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.mapping;
import static java.util.stream.Collectors.toList;

Map<String, List<Object>> map = listOfValues.stream()
    .collect(groupingBy(entry -> entry.id, mapping(entry -> entry.value, toList())));

This code is more declarative - it only specifies that List should be transformed into Map. Then it is a library responsibility to actually perform transformation in efficient way.

answered Jul 27, 2016 at 13:38

Nazarii Bardiuk

4,3421 gold badge22 silver badges23 bronze badges

1 Comment

beatngu13 Over a year ago

While I prefer declarative code as well, I disagree with you on "Then it is a library responsibility to actually perform transformation in efficient way". Sure, it's up to the library implementers that a given operation performs well. But in case of Java's groupingBy "There are no guarantees on the type, mutability, serializability, or thread-safety of the Map returned". So if it comes to performance, I'd say this are properties you want to control and, therefore, you have to take care of.

Collectives™ on Stack Overflow

Performance of algorithms sorting list entries into a map

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related