2

I wrote this function to generate random unique id's for my test cases:

func uuid(t *testing.T) string {
    uidCounterLock.Lock()
    defer uidCounterLock.Unlock()

    uidCounter++
    //return "[" + t.Name() + "|" + strconv.FormatInt(uidCounter, 10) + "]"
    return "[" + t.Name() + "|" + string(uidCounter) + "]"
}

var uidCounter int64 = 1
var uidCounterLock sync.Mutex

In order to test it, I generate a bunch of values from it in different goroutines, send them to the main thread, which puts the result in a map[string]int by doing map[v] = map[v] + 1. There is no concurrent access to this map, it's private to the main thread.

var seen = make(map[string]int)
for v := range ch {
    seen[v] = seen[v] + 1
    if count := seen[v]; count > 1 {
        fmt.Printf("Generated the same uuid %d times: %#v\n", count, v)
    }
}

When I just cast the uidCounter to a string, I get a ton of collisions on a single key. When I use strconv.FormatInt, I get no collisions at all.

When I say a ton, I mean I just got 1115919 collisions for the value [TestUuidIsUnique|�] out of 2227980 generated values, i.e. 50% of the values collide on the same key. The values are not equal. I do always get the same number of collisions for the same source code, so at least it's somewhat deterministic, i.e. probably not related to race conditions.

I'm not surprised integer overflow in a rune would be an issue, but I'm nowhere near 2^31, and that wouldn't explain why the map thinks 50% of the values have the same key. Also, I wouldn't expect a hash collision to impact correctness, just performance, since I can iterate over the keys in a map, so the values are stored there somewhere.

In the output, all runes printed are 0xEFBFBD. It's the same number of bits as the highest valid unicode code point, but that doesn't really match either.

Generated the same uuid 2 times: "[TestUuidIsUnique|�]"
Generated the same uuid 3 times: "[TestUuidIsUnique|�]"
Generated the same uuid 4 times: "[TestUuidIsUnique|�]"
Generated the same uuid 5 times: "[TestUuidIsUnique|�]"
...
Generated the same uuid 2047 times: "[TestUuidIsUnique|�]"
Generated the same uuid 2048 times: "[TestUuidIsUnique|�]"
Generated the same uuid 2049 times: "[TestUuidIsUnique|�]"
...

What's going on here? Did the go authors assume that hash(a) == hash(b) implies a == b for strings? Or am I just missing something silly? go test -race isn't complaining either.

I'm on macOS 10.13.2, and go version go1.9.2 darwin/amd64.

5
  • The hash table compares the value directly within buckets, the problem is you keep adding the same string: "[TestUuidIsUnique|�]" Commented Jan 23, 2018 at 16:36
  • @JimB sure, but why am I getting that same string? Why does it stop at that exact rune? It's even reproducible on the playground: play.golang.org/p/85KHbc4X9jV Commented Jan 23, 2018 at 16:44
  • @CeriseLimón Ah, that has to be it! Would you like to write that as an answer? Commented Jan 23, 2018 at 16:44
  • @FilipHaglund: converting a invalid rune to a string always returns "\ufffd" or "\xef\xbf\xbd" Commented Jan 23, 2018 at 16:45
  • I just assumed that the replacement character was because the font couldn't display those odd characters (that probably aren't even defined yet). That's the problem with in-band errors, I guess. Commented Jan 23, 2018 at 19:22

1 Answer 1

8

String conversion of an invalid rune returns a string containing the unicode replacement character: "�".

Use the strconv package to convert an integer to text.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.