12

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0. However, I'm pretty sure this wouldn't work so well with Unicode. I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property, but I can't think of a specific optimization that would do this off the top of my head.

Thanks.

UPDATE: Sorry, I don't believe I was clear. I'm not looking for O(1), I'm looking for something that won't leak timing information. This would be used to compare hashed password values, and if the time it took to compare was different based on where the first mismatch occurred, that would be leaking information to an attacker.

6
  • 7
    The time it takes to compare 2 strings would always be a function of the length of the string. You have to do some sort of comparison one way or another - and the more characters you have, the longer this comparison would take. Commented Aug 25, 2011 at 13:25
  • How would the complexity of XORing each character not depend on the number of characters? Commented Aug 25, 2011 at 13:26
  • 1
    Am I correct in assuming you're asking for a String comparison function (returning true of false for equal or non-equal) that is entirely independent from String length? I don't actually think that's theoretically possible unless you have a number of processors/cores equal to the maximum String length. The best I can think of is optimizing strongly through comparing hash codes and length first, but even the hash code calculation depends on String length. Commented Aug 25, 2011 at 13:27
  • 1
    you might be able to use the hash field, it might help to determine 2 strings are not equal, if the hash was already computed, but I think that's about it. Commented Aug 25, 2011 at 13:28
  • Can you tell us why you want to do this? Given the answers telling you that this is either impossible or impractical, knowing what you are trying to achieve would help us give you a good answer. Commented Aug 25, 2011 at 13:43

6 Answers 6

5

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character

Do you see the contradiction?

update:

To the updated, and therefore different question:

Can you gain information once, how much time is spent for comparing 2 Strings, in terms of constant amount and time, depending on the length of the two strings?

a + b*s(1).length + c*s(2).length + d*f(s(1), s(2))? 

Is there an upper bound of characters for String 1 and 2?

If the time is, depending on a factor for the machine, for example for the longest strings you expect 0.01ms. You measure the time to encode the string, and stay idle until you reach that time, maybe + a factor of rand(10%) of the time.

If the length of the input is not limited, you could calculate the timing in a way, that will fit for 99%, 99.9% or 99.99% of typical input, depending on your security needs, and the speed of the machine. If the program is interacting with the user, a delay up to 0.2s is normally experienced as instant reaction, so it wouldn't annoy the user, if your code sleeps for 0.19994s, while doing real calculations for 0.00006s.

Sign up to request clarification or add additional context in comments.

3 Comments

The randomness doesn't necessarily help since the attacker can just take more samples to exclude it completely.
@MichaelMior That's surely true, but the comparison takes some nanoseconds only and adding a random milliseconds delay may make the number of needed samples damn high.
The question asked about not leaking info about "the number of characters that match". By excluding "that match" from your bold text, you have completely changed his meaning. This answer is wrong.
5

The following code is one common way to do a constant-time byte[] comparison in Java and avoid leaking password info via the time taken:

public static boolean isEqual(byte[] a, byte[] b) {
    if (a.length != b.length) {
        return false;
    }

    int result = 0;
    for (int i = 0; i < a.length; i++) {
      result |= a[i] ^ b[i];
    }
    return result == 0;
}

See http://codahale.com/a-lesson-in-timing-attacks/ for more discussion of this issue.

Current implementation in openjdk as an inspiration is available in isEqual method.

(This assumes that the length of the secret is not sensitive, for example if it is a hash. You should pad both sides to the same length if that is not true.)

This is essentially the same as your first suggestion:

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0.

You asked:

However, I'm pretty sure this wouldn't work so well with Unicode.

This is a valid concern, but you need to clarify what you will accept as "equal" for a solution to be proposed. Luckily, you also say "This would be used to compare hashed password values", so I don't think that any of the unicode concerns will be in play.

I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property,

Hopefully that's not true. I expect that the literature on how to avoid timing attacks in Java would address this if it were true, but I can't offer you any citations to back this up :-)

2 Comments

Rich, you probably need to look at the early exit on different password lengths. Even the length of the password is information that can be leaked.
Thanks, I've updated my answer to note this. I think in most cases the secret will be a hash, so the length is not sensitive.
3

I see two immediate possibilities for not leaking password-related information in timing:

1/ Pad both the password string and candidate string out to 1K, with a known, fixed character (like A). Then run the following (pseudo-code):

match = true
for i = 0 to 1023:
    if password[i] != candidate[i]:
        match = false

That way, you're always taking the same amount of loops to do the comparison regardless of where it matches.

There's no need to muck about with xor since you can still do a simple comparison, but without exiting the loop early.

Just set the match flag to false if a mismatch is found and keep going. Once the loop exits (taking the same time regardless of size or content of password and candidate), then check whether it matched.

2/ Just add a large (relative to the normal comparison time) but slightly random delay at the end of the comparison. For example, a random value between 0.9 and 1.1 seconds. The time taken for the comparison should be swamped by the delay and the randomness should fully mask any information leakage (unless your randomness algorithm leaks information, of course).

That also has the added advantage of preventing brute force attacks since a password check takes at least about a second.

8 Comments

My concern is that the strings might not be normalized, so the function might return false even if both values are español (to use a completely made up and perhaps not even valid example). Also, I'm aware that Unicode is not an encoding, but as I understand it, normalization is a problem no matter what encoding is in use.
@Hank, now your requirement makes a little more sense - I've actually had to do this sort of stuff before, hence my wonder answer at stackoverflow.com/questions/712339/… :-) I've updated this answer with a couple of possibilities for you.
The padding will have the nice benefit of not even revealing the length of the password, thanks. Do you have any thoughts on the normalization issue?
You may want to change the comparison in 1/ to match |= password[i] ^ candidate[i] (Note that this also inverts the value of match, so it really should be no_match and initialized to 0). The different branches of the if statement can stil leak timing information. (There's an extra jump if the characters match.)
I think your point (2) is still bad advice. Adding a random delay will not defeat a timing attack. An external attacker can still estimate X from (X + random(0.9,1.1)) by statistical methods. They'll be averaging out random delays already to take account of the network delays, so this doesn't even make it harder. See codahale.com/a-lesson-in-timing-attacks for some commentary. (I've removed my previous comment, as you've obsoleted it, thanks.)
|
3

This should take approximately the same time for any matching length Strings. It's constant-time with a big constant.

public static boolean areEqualConstantTime(String a, String b) {
    if ( a.length != b.length ) {
        return false;
    }

    boolean equal = true;
    for ( long i = 0; i < (Long)Integer.MAX_INT; i++ ) {
        if ( a.charAt((int)(i % aChars.length)) != b.charAt((int)(i % bChars.length))) {
            equal = false;
        }
    }
    return equal;
}

Edit

Wow, if you're just trying to avoid leaking timing information this facetious answer got pretty close to the mark! We can start with a naive approach like this:

public static boolean arePasswordsEqual(String a, String b) {
    boolean equal = true;
    if ( a.length != b.length ) {
       equal = false;
    }

    for ( int i = 0; i < MAX_PASSWORD_LENGTH; i++ ) {
        if ( a.charAt(i%a.length()) != b.charAt(i%b.length()) ) {
            equal = false;
        }
    }
    return equal;
 }

We need the MAX_PASSWORD_LENGTH constant because we can't simply use either the max or the min of the two input lengths as that would also leak timing information. An attacker could start with a very small guess and see how long the function takes. When the function time plateaus, he would know his password has the right length which eliminates much of the range of values he needs to try.

9 Comments

@Hank: No problem. Just as an aside, the popular cryptographically secure hashes have constant length outputs and due to the snowball principle leaking timing information probably wouldn't be as big of a deal. Are you comparing passwords in clear text?
No, I'm looking at the source to a Java impl of bcrypt. It's certainly not that big of a deal, but w/ security I always opt for safer than I think is remotely necessary. I'm thinking about patching it to use this comparison instead of the built-in Java comparison.
@Hank: Cool. And by "snowball principle" I meant "avalanche property"
Wouldn't it be simpler to use the length of the user-controlled string instead of MAX_PASSWORD_LENGTH? Since this length is already known, nothing would be leaked.
@Qerub: Yep, you could do that. As I said earlier, this is pretty irrelevant if you're hashing passwords to a fixed-length hash.
|
-1

You can do constant time string comparisons if you intern your strings. Then you can compare them with == operator resulting in a constant time equality check.

Read this for further detials

4 Comments

Interning is not an O(1) operation though!
Yes thats the tradeoff. It depends on whether you have more unique strings or duplicates in your data set.
The problem is making the comparison of a user-supplied string and a secret to a constant time operation. The secret can be interned in advance, but the time for the interning of the user-supplied string may be observed. It's unclear whether and how it depends on the secret string, but I'd call it risky.
operator == does answer question, if given strings are same instances... Intern could make deduplication and make == same result as .equals, but timing can leak from calling intern method...
-1

The usual solution is to set a timer, and not return the result until the timer has expired.

The time taken to compare strings or hashes is not important, just set the time-out value sufficiently large.

1 Comment

I don't think that's the "usual" solution, as it's very slow. Most languages and frameworks do something like "OR"ing every byte. See e.g. codahale.com/a-lesson-in-timing-attacks or php.net/manual/en/function.hash-equals.php

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.