2

My goal is to attack Java's HashMap with several strings (around 10000 or more), that produce the same hash. I used the script available on this link, translated it to Python3 (so that I can produce the strings on my terminal itself) and ran on my machine to produce around 177147 strings. All these strings produce the same hash when String.hashCode() method is called. From this link, it is evident that the time complexity would be O(N*N) if random read and write is performed on the HashMap. It should take more time if N is huge (in this case, N is greater than 10000). But surprisingly, it is running in less than 2s. I want it to take more than 10s. The following are the scripts I am using.

# Python3 script to generate strings that produce
# same hash with Java's String.hashCode() method
def combs(n,arr,str,arr_rtn):
    if n==1:
        for j in range(3):
            arr_rtn[str + arr[j]] = 0
    else:
        for j in range(3):
            combs(n-1,arr,str+arr[j],arr_rtn)

arr_src = ['at','bU','c6']
str_tmp = ''
arr_rtn = dict()

t = combs(11,arr_src,str_tmp,arr_rtn)

keys = list(arr_rtn.keys())

print(len(keys))
print(*keys, sep = '\n')
// Java code to insert the generated
// strings into HashMap
import java.util.*;

class Main {
    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
        int n = sc.nextInt();
        HashMap<String, Integer> map = new HashMap<>();
        for (int i = 0; i < n; i++) {
            String s = sc.next();
            map.put(s, s.hashCode());
            // Should take more time since all strings 
            // hash to same value
        }
        System.out.println(map.size());
        sc.close();
    }
}

My only goal is to attack the HashMap with strings that produce same hash, so that it takes more than 20s (at least) to execute.

3
  • Run on a slower computer? Disable the JIT? Use a special key class which sleeps in it's compareTo method (or must you use String?)? In other words, 20 seconds is aribitrary. Isn't it better to show that you have reached the point where insertion is slowing down due to the collisions, than to hit some random duration? Commented Nov 21, 2021 at 5:32
  • Note that 10,000 (ten thousand) elements may simply be too little. I just wrote a little test case where I create a LinkedList with n elements, and then search for a random element n times, doubling n every test, and it only started to take 20 seconds or more when n = 2^17. Commented Nov 21, 2021 at 7:03
  • Also, complexity only tells you the rate of growth of the execution time. The actual time it takes an algorithm to execute will depend on the computer. Instead of trying to make the algorithm take 20 seconds, consider if demonstrating the complexity is n^2 would be sufficient. You can do that by repeatedly measuring the algorithm with increasing problem sizes and analyzing the experimental time growth. Commented Nov 21, 2021 at 7:25

2 Answers 2

2

Actually, Java's HashMap treats collisions by placing multiple values mapping to the same bucket in a linked list. Hence, the worst case performance for search in a Java HashMap is O(N), where N is the number of elements in the map.

Actually, in more recent versions of Java, HashMap deals with collisions by placing all same bucket elements in a tree. This means that the worst case search performance becomes O(h), where h is the height of the tree.

Sign up to request clarification or add additional context in comments.

6 Comments

If a linked list is used, it should take more time for random read and write, am I wrong here?
@ChitturiSaiSuman No, most operations worst case should be O(N), where N is the number of collision elements. Note that finding the bucket itself is O(1), regardless of how many other elements/buckets are present in the map.
yeah, finding the bucket itself is O(1), but to check what value the given key maps to, the control has to iterate over the linked list from the beginning. So, it should take O(N) time for just one read, so O(N * N) for N random read/write. Am I wrong? You also mentioned that the latest versions use Trees, can you elaborate on it?
@ChitturiSaiSuman Looking up a key in a hashmap to find a value always takes constant time. Once that is out of the way, then what you said is correct, and we would have to iterate the linked list.
can you suggest a way to make the script take more than 20s to execute? (kinda increase the collisions)
|
1

As Tim says, from Java 8 onwards, HashMap will replace the simple linked list hashchain with a balanced binary tree. This happens:

  • when a chain's length exceeds a hard-wired threshold (8 in Java 8), and
  • when the array exceeds another hard-wired threshold (64 in Java 8).

If the key class implements Comparable<?> (as String does) then the compareTo method is used for the tree ordering.

That means that the worst-case for a HashMap<String> with N string keys that are engineered to collide will be O(logN) rather than O(N).

In short, a collision attack against HashMap won't be effective in Java 8 or later. (But you could try running your test attack on Java 7 or older.)


If you need more information on how HashMap's list vs tree stuff works, it is extensively commented in the source code. Find the source code that matches the version you are interested in ... and read it.


Note that the question you cited as your source of information ( java HashMap collision ) was asked and answered in 2013. Java 8 was released in March 2014.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.