I have a list/array of IP address as string. I need to identify if there are any duplicates in this array and log an error. The array is about 20 elements big. What is an efficient way to identify a duplicate ?
3 Answers
- sort original array
- iterate over sorted array, and count different values
- create new array with size of (2)
- copy values from original to new array, skipping duplicates
pseudo in bash:
[user@linux ~]$ cat 1.txt
1
2
3
66
1
1
66
3
7
7
7
7
26
[user@linux ~]$ cat 1.txt | sort | uniq
1
2
26
3
66
7
[user@linux ~]$ cat 1.txt | sort | uniq | wc -l
7
6 Comments
elcuco
No, just too lazy to write C++ code, and instead giving the guy a general idea... :)
E_net4
I'm afraid that won't cut it. One might as well hand over a C++ book as an answer, huh?
elcuco
On 3rd tought you are probably right. The complexity of my solution is very hard ~ (O(n log n) * 2N), and I think that the hash map solution given above is much better suited. Leaving this for the whole internet to know how lame this answer is.
E_net4
Perhaps you're better off removing it before someone downvotes it. ;)
elcuco
I am committed to my stupidity. No. Also - this is a good example of what no to do.
|
You can use a map<string, int> to mark used addresses and where an address appeared first:
void check_dups(const std::vector<std::string>& addresses) {
std::map<std::string, int> seen;
for (int i=0,n=addresses.size(); i<n; i++) {
std::map<std::string, int>::iterator it = seen.find(addreses[i]);
if (it == seen.end()) {
// Never used before, mark the position
seen[addresses[i]] = i;
} else {
// Duplicated value, emit a warning
std::cout << "Duplicate address at index " << i <<
" (present already at index " << it->second << ")\n";
}
}
}
6 Comments
Kris
Nice, but std::map is implemented using red-black trees, so inserting into it has complexity O(log n). This means that overall we would get O(n logn), no better than simply sorting the array and looking for adjacent equal elements.
E_net4
@Krystian If that is your main concern, just use
std::unordered_map.Kris
E_net4: but you need C++11 support to use std::undordered_map
E_net4
@Krystian C++11 is a mature standard right now. It is only very likely that it is supported in OP's compiler.
6502
@Krystian: as an error message are you proposing "There are some duplicates somewhere, good luck finding them..."?
|
here are 3 reasonably efficient ways, from the top of my head:
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <set>
// returns a sorted, de-duplicated copy
std::vector<std::string> de_duplicated(std::vector<std::string> vec)
{
std::set<std::string> interim { vec.begin(), vec.end() };
vec.assign(interim.begin(), interim.end());
return vec;
}
// sorts and de-duplicates in place
void de_duplicate(std::vector<std::string>& vec)
{
std::sort(std::begin(vec), std::end(vec));
auto current = std::begin(vec);
do {
auto last = std::end(vec);
current = std::adjacent_find(current, last);
if (current != last) {
auto last_same = std::find_if_not(std::next(current),
last,
[¤t](const std::string& s) {
return s == *current;
});
current = vec.erase(std::next(current), last_same);
}
} while(current != std::end(vec));
}
// returns a de-duplicated copy, preserving order
std::vector<std::string> de_duplicated_stable(const std::vector<std::string>& vec)
{
std::set<std::string> index;
std::vector<std::string> result;
for (const auto& s : vec) {
if (index.insert(s).second) {
result.push_back(s);
}
}
return result;
}
using namespace std;
int main() {
std::vector<std::string> addresses { "d", "a", "c", "d", "c", "a", "c", "d" };
cout << "before" << endl;
std::copy(begin(addresses), end(addresses), ostream_iterator<string>(cout, ", "));
cout << endl;
auto deduplicated = de_duplicated(addresses);
cout << endl << "sorted, de-duplicated copy" << endl;
std::copy(begin(deduplicated), end(deduplicated), ostream_iterator<string>(cout, ", "));
cout << endl;
deduplicated = de_duplicated_stable(addresses);
cout << endl << "sorted, stable copy" << endl;
std::copy(begin(deduplicated), end(deduplicated), ostream_iterator<string>(cout, ", "));
cout << endl;
de_duplicate(addresses);
cout << endl << "sorted, de-duplicated in-place" << endl;
std::copy(begin(addresses), end(addresses), ostream_iterator<string>(cout, ", "));
cout << endl;
return 0;
}
==and<operators instead, and store the IPs in a set (preferably hashed).