6

I have written below code for detecting first duplicate character in a string.

public static int detectDuplicate(String source) {
    boolean found = false;
    int index = -1;
    final long start = System.currentTimeMillis();
    final int length = source.length();
    for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) {
        boolean shiftPointer = false;
        for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) {
            if ( source.charAt(outerIndex) == source.charAt(innerIndex)) {
                found = true;
                index = outerIndex;
            } else {
                shiftPointer = true;
            }
        }
    }
    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

I need help on two things:

  • What is the worst case complexity of this algorithm? - my understanding is O(n).
  • Is it the best way to do this? Can somebody provide a better solution (if any)?

Thanks, NN

3
  • 2
    Take out all the benchmarking stuff. Or better yet, write the algorithm in pseudocode. Commented Sep 6, 2012 at 17:09
  • 2
    By "first duplicate character", do you mean the duplicate character whose first occurrence is earliest, or whose second occurrence is earliest? In other words, in "abba", is "a" or "b" the first duplicate character? Commented Sep 6, 2012 at 17:21
  • I am sorry I forgot to add an example to show my motive. Commented Sep 6, 2012 at 18:05

7 Answers 7

13

As mentioned by others, your algorithm is O(n^2). Here is an O(N) algorithm, because HashSet#add runs in constant time ( the hash function disperses the elements properly among the buckets) - Note that I originally size the hashset to the maximum size to avoid resizing/rehashing:

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        if (!uniqueChars.add(chars[i])) return i;
    }
    return -1;
}

Note: this returns the index of the first duplicate (i.e. the index of the first character that is a duplicate of a previous character). To return the index of the first appearance of that character, you would need to store the indices in a Map<Character, Integer> (Map#put is also O(1) in this case):

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        Integer previousIndex = uniqueChars.put(chars[i], i);
        if (previousIndex != null) {
            return previousIndex;
        }
    }
    return -1;
}
Sign up to request clarification or add additional context in comments.

13 Comments

The original procedure returns the index of the first occurrence duplicate character, not the character itself. But that's a simple modification.
@TomAnderson it says "assuming the hash function disperses the elements properly among the buckets", which I take as an indication of the fact that the worst-case complexity is actually not O(1). This is, of course, irrelevant, as hash table can be implemented so as to ensure O(1) lookup/insertion/deletion.
@Qnan but the Character class will disperse the elements evenly, Character.hashCode() just returns the integer value of the char value.
@mattb you're correct, but if we confine ourselves to an alphabet of a fixed size, than the algorithm would, technically, take constant time, as amoebe pointed out in his answer.
|
1

The complexity is roughly O(M^2), where M is the minimum between the length of the string and the size of the set of possible characters K.

You can get it down to O(M) with O(K) memory by simply memorizing the position where you first encounter every unique character.

Comments

1

This is O(n**2), not O(n). Consider the case abcdefghijklmnopqrstuvwxyzz. outerIndex will range from 0 to 25 before the procedure terminates, and each time it increments, innerIndex will have ranged from outerIndex to 26.

To get to O(n), you need to make a single pass over the list, and to do O(1) work at each position. Since the job to do at each position is to check if the character has been seen before (and if so, where), that means you need an O(1) map implementation. A hashtable gives you that; so does an array, indexed by the character code.

assylias shows how to do it with hashing, so here's how to do it with an array (just for laughs, really):

public static int detectDuplicate(String source) {
    int[] firstOccurrence = new int[1 << Character.SIZE];
    Arrays.fill(firstOccurrence, -1);
    for (int i = 0; i < source.length(); i++) {
        char ch = source.charAt(i);
        if (firstOccurrence[ch] != -1) return firstOccurrence[ch];
        else firstOccurrence[ch] = i;
    }
    return -1;
}

Comments

0

Okay, I found below logic to reduce O(N^2) to O(N).

public static int detectDuplicate(String source) {
    int index = -1;
    boolean found = false;
    final long start = System.currentTimeMillis();

    for(int i = 1; i <= source.length() && !found; i++) {
        if(source.charAt(i) == source.charAt(i-1)) {
            index = (i - 1);
            found = true;
        }
    }

    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

This also shows performance improvement over my previous algorithm which has 2 nested loops. This takes 130ms. to detect first duplicate character from 63million characters where the duplicate character is present at the end.

I am not confident if this is the best solution. If anyone finds a better one, please please share it.

Thanks,

NN

2 Comments

Your solution only finds duplicate characters that are close to each other, try finding the first duplicate in the string: "abca". Your algorithm won't find any.
Hi, my intention was that. My apologies again for confusion, but I wanted to find the first duplicate character which appears side by side. In your string there is no such occurrence.
0

I can substantially improve your algorithm. It should be done like this:

StringBuffer source ...
char charLast = source.charAt( source.len()-1 );
int xLastChar = source.len()-1;
source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) );
int i = 1;
while( true ){
    if( source.charAt(i) == source.charAt(i-1) ) break;
    i += 1;
}
source.setCharAt( xLastChar, charLast );
if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1;
return i;

For a large string this algorithm is probably twice as fast as yours.

1 Comment

This gives the index of the first character identical to the one immediately before. The procedure from the question returns the lowest index of a character occurring for a second time anywhere in the string. Using a separate StringBuffer, one might save special casing by appending the last character.
0

You could try with:

 public static char firstRecurringChar(String s)
    {
    char x=' ';
    System.out.println("STRING : "+s);
    for(int i =0;i<s.length();i++)
    {
        System.out.println("CHAR AT "+i+" = " +s.charAt(i));
        System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i)));
        if(s.lastIndexOf(s.charAt(i)) >i){
            x=s.charAt(i);
            break;
        }
    }
    return x;
    } 

Comments

-1

O(1) Algorithm

Your solution is O(n^2) because of the two nested loops.

The fastest algorithm to do this is O(1) (constant time):

public static int detectDuplicate(String source) {
    boolean[] foundChars = new boolean[Character.MAX_VALUE+1];
    for(int i = 0; i < source.length(); i++) {
        if(i >= Character.MAX_VALUE) return Character.MAX_VALUE;
        char currentChar = source.charAt(i);
        if(foundChars[currentChar]) return i;
        foundChars[currentChar] = true;
    }
    return -1;
}

However, this is only fast in terms of big oh.

13 Comments

I would be very, very interested to see an O(1) algorithm for solving this problem. Could you describe one?
It's only O(1) because you assume that there's a fixed number of possible characters. It's still linear in the size of the alphabet.
That's O(n). It's also incorrect; if the string is every possible character in order, followed by a repetition of one of those characters, then you will return -1, rather than the index of that character.
Well Java's char has a fixed number of characters, we're not talking about pseudo code here. I fixed the case Tom mentioned. And its really O(1).
that's what happens when one starts talking about the complexity of java code, rather than just algorithms :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.