1

I have a list/array of IP address as string. I need to identify if there are any duplicates in this array and log an error. The array is about 20 elements big. What is an efficient way to identify a duplicate ?

5
  • Use a data structure with overloaded == and < operators instead, and store the IPs in a set (preferably hashed). Commented Dec 7, 2014 at 13:15
  • 1
    Sort them then check adjacent pairs. Commented Dec 7, 2014 at 13:17
  • Converting from string form first is probably a win. Commented Dec 7, 2014 at 13:17
  • 4
    Checking every element against every other element is only 200 comparisons. This is a pretty small problem; why does it need to be efficient? Commented Dec 7, 2014 at 13:19
  • @AlanStokes - you are right. I am trying to write a generic code so that if tomorrow the number of elements to be compared increases I would not have to rewrite the code. Commented Dec 7, 2014 at 13:57

3 Answers 3

2
  1. sort original array
  2. iterate over sorted array, and count different values
  3. create new array with size of (2)
  4. copy values from original to new array, skipping duplicates

pseudo in bash:

[user@linux ~]$ cat 1.txt
1
2
3
66
1
1
66
3
7
7
7
7
26

[user@linux ~]$ cat 1.txt | sort | uniq

1
2
26
3
66
7
[user@linux ~]$ cat 1.txt | sort | uniq | wc -l
       7
Sign up to request clarification or add additional context in comments.

6 Comments

No, just too lazy to write C++ code, and instead giving the guy a general idea... :)
I'm afraid that won't cut it. One might as well hand over a C++ book as an answer, huh?
On 3rd tought you are probably right. The complexity of my solution is very hard ~ (O(n log n) * 2N), and I think that the hash map solution given above is much better suited. Leaving this for the whole internet to know how lame this answer is.
Perhaps you're better off removing it before someone downvotes it. ;)
I am committed to my stupidity. No. Also - this is a good example of what no to do.
|
2

You can use a map<string, int> to mark used addresses and where an address appeared first:

void check_dups(const std::vector<std::string>& addresses) {
    std::map<std::string, int> seen;
    for (int i=0,n=addresses.size(); i<n; i++) {
        std::map<std::string, int>::iterator it = seen.find(addreses[i]);
        if (it == seen.end()) {
            // Never used before, mark the position
            seen[addresses[i]] = i;
        } else {
            // Duplicated value, emit a warning
            std::cout << "Duplicate address at index " << i <<
                         " (present already at index " << it->second << ")\n";
        }
    }
}

6 Comments

Nice, but std::map is implemented using red-black trees, so inserting into it has complexity O(log n). This means that overall we would get O(n logn), no better than simply sorting the array and looking for adjacent equal elements.
@Krystian If that is your main concern, just use std::unordered_map.
E_net4: but you need C++11 support to use std::undordered_map
@Krystian C++11 is a mature standard right now. It is only very likely that it is supported in OP's compiler.
@Krystian: as an error message are you proposing "There are some duplicates somewhere, good luck finding them..."?
|
0

here are 3 reasonably efficient ways, from the top of my head:

#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <set>

// returns a sorted, de-duplicated copy
std::vector<std::string> de_duplicated(std::vector<std::string> vec)
{
    std::set<std::string> interim { vec.begin(), vec.end() };
    vec.assign(interim.begin(), interim.end());
    return vec;
}

// sorts and de-duplicates in place
void de_duplicate(std::vector<std::string>& vec)
{
    std::sort(std::begin(vec), std::end(vec));

    auto current = std::begin(vec);

    do {
        auto last = std::end(vec);
        current = std::adjacent_find(current, last);
        if (current != last) {
            auto last_same = std::find_if_not(std::next(current),
                                              last,
                                              [&current](const std::string& s) {
                                                  return s == *current;
                                              });
            current = vec.erase(std::next(current), last_same);
        }
    } while(current != std::end(vec));

}

// returns a de-duplicated copy, preserving order
std::vector<std::string> de_duplicated_stable(const std::vector<std::string>& vec)
{
    std::set<std::string> index;
    std::vector<std::string> result;
    for (const auto& s : vec) {
        if (index.insert(s).second) {
            result.push_back(s);
        }
    }

    return result;
}





using namespace std;


int main() {

    std::vector<std::string> addresses { "d", "a", "c", "d", "c", "a", "c", "d" };

    cout << "before" << endl;
    std::copy(begin(addresses), end(addresses), ostream_iterator<string>(cout, ", "));
    cout << endl;

    auto deduplicated = de_duplicated(addresses);
    cout << endl << "sorted, de-duplicated copy" << endl;
    std::copy(begin(deduplicated), end(deduplicated), ostream_iterator<string>(cout, ", "));
    cout << endl;

    deduplicated = de_duplicated_stable(addresses);
    cout << endl << "sorted, stable copy" << endl;
    std::copy(begin(deduplicated), end(deduplicated), ostream_iterator<string>(cout, ", "));
    cout << endl;

    de_duplicate(addresses);
    cout << endl << "sorted, de-duplicated in-place" << endl;
    std::copy(begin(addresses), end(addresses), ostream_iterator<string>(cout, ", "));
    cout << endl;

    return 0;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.