Java - What is the best way to find first duplicate character in a string

Question

I have written below code for detecting first duplicate character in a string.

public static int detectDuplicate(String source) {
    boolean found = false;
    int index = -1;
    final long start = System.currentTimeMillis();
    final int length = source.length();
    for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) {
        boolean shiftPointer = false;
        for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) {
            if ( source.charAt(outerIndex) == source.charAt(innerIndex)) {
                found = true;
                index = outerIndex;
            } else {
                shiftPointer = true;
            }
        }
    }
    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

I need help on two things:

What is the worst case complexity of this algorithm? - my understanding is O(n).
Is it the best way to do this? Can somebody provide a better solution (if any)?

Thanks, NN

Take out all the benchmarking stuff. Or better yet, write the algorithm in pseudocode. — David Titarenco
– David Titarenco, Commented Sep 6, 2012 at 17:09
By "first duplicate character", do you mean the duplicate character whose first occurrence is earliest, or whose second occurrence is earliest? In other words, in "abba", is "a" or "b" the first duplicate character? — Tom Anderson
– Tom Anderson, Commented Sep 6, 2012 at 17:21

assylias · Accepted Answer · 2012-09-06 17:31:03Z

13

As mentioned by others, your algorithm is O(n^2). Here is an O(N) algorithm, because HashSet#add runs in constant time ( the hash function disperses the elements properly among the buckets) - Note that I originally size the hashset to the maximum size to avoid resizing/rehashing:

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        if (!uniqueChars.add(chars[i])) return i;
    }
    return -1;
}

Note: this returns the index of the first duplicate (i.e. the index of the first character that is a duplicate of a previous character). To return the index of the first appearance of that character, you would need to store the indices in a Map<Character, Integer> (Map#put is also O(1) in this case):

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        Integer previousIndex = uniqueChars.put(chars[i], i);
        if (previousIndex != null) {
            return previousIndex;
        }
    }
    return -1;
}

edited Sep 6, 2012 at 17:31

answered Sep 6, 2012 at 17:14

assylias

330k84 gold badges680 silver badges806 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Tom Anderson Over a year ago

The original procedure returns the index of the first occurrence duplicate character, not the character itself. But that's a simple modification.

Tom Anderson Over a year ago

@Qnan: HashSet "offers constant time performance for the basic operations (add, remove, contains and size)". So yes, it does.

Qnan Over a year ago

@TomAnderson it says "assuming the hash function disperses the elements properly among the buckets", which I take as an indication of the fact that the worst-case complexity is actually not O(1). This is, of course, irrelevant, as hash table can be implemented so as to ensure O(1) lookup/insertion/deletion.

matt b Over a year ago

@Qnan but the Character class will disperse the elements evenly, Character.hashCode() just returns the integer value of the char value.

Qnan Over a year ago

@mattb you're correct, but if we confine ourselves to an alphabet of a fixed size, than the algorithm would, technically, take constant time, as amoebe pointed out in his answer.

|

Qnan · Accepted Answer · 2012-09-06 17:12:12Z

1

The complexity is roughly O(M^2), where M is the minimum between the length of the string and the size of the set of possible characters K.

You can get it down to O(M) with O(K) memory by simply memorizing the position where you first encounter every unique character.

answered Sep 6, 2012 at 17:12

Qnan

3,74421 silver badges15 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:14:19Z

This is O(n**2), not O(n). Consider the case abcdefghijklmnopqrstuvwxyzz. outerIndex will range from 0 to 25 before the procedure terminates, and each time it increments, innerIndex will have ranged from outerIndex to 26.

To get to O(n), you need to make a single pass over the list, and to do O(1) work at each position. Since the job to do at each position is to check if the character has been seen before (and if so, where), that means you need an O(1) map implementation. A hashtable gives you that; so does an array, indexed by the character code.

assylias shows how to do it with hashing, so here's how to do it with an array (just for laughs, really):

public static int detectDuplicate(String source) {
    int[] firstOccurrence = new int[1 << Character.SIZE];
    Arrays.fill(firstOccurrence, -1);
    for (int i = 0; i < source.length(); i++) {
        char ch = source.charAt(i);
        if (firstOccurrence[ch] != -1) return firstOccurrence[ch];
        else firstOccurrence[ch] = i;
    }
    return -1;
}

Niranjan · Accepted Answer · 2012-09-08 04:01:29Z

0

Okay, I found below logic to reduce O(N^2) to O(N).

public static int detectDuplicate(String source) {
    int index = -1;
    boolean found = false;
    final long start = System.currentTimeMillis();

    for(int i = 1; i <= source.length() && !found; i++) {
        if(source.charAt(i) == source.charAt(i-1)) {
            index = (i - 1);
            found = true;
        }
    }

    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

This also shows performance improvement over my previous algorithm which has 2 nested loops. This takes 130ms. to detect first duplicate character from 63million characters where the duplicate character is present at the end.

I am not confident if this is the best solution. If anyone finds a better one, please please share it.

Thanks,

NN

answered Sep 8, 2012 at 4:01

Niranjan

2,9918 gold badges48 silver badges57 bronze badges

2 Comments

jontejj Over a year ago

Your solution only finds duplicate characters that are close to each other, try finding the first duplicate in the string: "abca". Your algorithm won't find any.

Niranjan Over a year ago

Hi, my intention was that. My apologies again for confusion, but I wanted to find the first duplicate character which appears side by side. In your string there is no such occurrence.

Tyler Durden · Accepted Answer · 2012-10-25 17:33:55Z

0

I can substantially improve your algorithm. It should be done like this:

StringBuffer source ...
char charLast = source.charAt( source.len()-1 );
int xLastChar = source.len()-1;
source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) );
int i = 1;
while( true ){
    if( source.charAt(i) == source.charAt(i-1) ) break;
    i += 1;
}
source.setCharAt( xLastChar, charLast );
if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1;
return i;

For a large string this algorithm is probably twice as fast as yours.

answered Oct 25, 2012 at 17:33

Tyler Durden

11.6k11 gold badges76 silver badges137 bronze badges

1 Comment

greybeard Over a year ago

This gives the index of the first character identical to the one immediately before. The procedure from the question returns the lowest index of a character occurring for a second time anywhere in the string. Using a separate StringBuffer, one might save special casing by appending the last character.

Ant4res · Accepted Answer · 2017-12-24 00:34:51Z

0

You could try with:

 public static char firstRecurringChar(String s)
    {
    char x=' ';
    System.out.println("STRING : "+s);
    for(int i =0;i<s.length();i++)
    {
        System.out.println("CHAR AT "+i+" = " +s.charAt(i));
        System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i)));
        if(s.lastIndexOf(s.charAt(i)) >i){
            x=s.charAt(i);
            break;
        }
    }
    return x;
    }

edited Dec 24, 2017 at 0:34

Ant4res

1,2251 gold badge18 silver badges36 bronze badges

answered Dec 23, 2017 at 20:25

Ankita Walia

1

Comments

amoebe · Accepted Answer · 2012-09-06 17:35:51Z

-1

O(1) Algorithm

Your solution is O(n^2) because of the two nested loops.

The fastest algorithm to do this is O(1) (constant time):

public static int detectDuplicate(String source) {
    boolean[] foundChars = new boolean[Character.MAX_VALUE+1];
    for(int i = 0; i < source.length(); i++) {
        if(i >= Character.MAX_VALUE) return Character.MAX_VALUE;
        char currentChar = source.charAt(i);
        if(foundChars[currentChar]) return i;
        foundChars[currentChar] = true;
    }
    return -1;
}

However, this is only fast in terms of big oh.

edited Sep 6, 2012 at 17:35

answered Sep 6, 2012 at 17:11

amoebe

5,0415 gold badges40 silver badges44 bronze badges

13 Comments

Tom Anderson Over a year ago

I would be very, very interested to see an O(1) algorithm for solving this problem. Could you describe one?

Qnan Over a year ago

It's only O(1) because you assume that there's a fixed number of possible characters. It's still linear in the size of the alphabet.

Tom Anderson Over a year ago

That's O(n). It's also incorrect; if the string is every possible character in order, followed by a repetition of one of those characters, then you will return -1, rather than the index of that character.

amoebe Over a year ago

Well Java's char has a fixed number of characters, we're not talking about pseudo code here. I fixed the case Tom mentioned. And its really O(1).

Qnan Over a year ago

that's what happens when one starts talking about the complexity of java code, rather than just algorithms :)

|

Collectives™ on Stack Overflow

Java - What is the best way to find first duplicate character in a string

7 Answers 7

13 Comments

Comments

Comments

2 Comments

1 Comment

Comments

13 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

13 Comments

Comments

Comments

2 Comments

1 Comment

Comments

13 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related