C++ Remove string as duplicate in vector based on partial match of substring

Question

Is there an efficient way to remove a string as a duplicate based on a match of a substring?

Consider a vector (combinedvect) containing a number of CSV strings of the same form:

'QWERTY, PROTMD, PEDKF, 12345.15454'
'ASDFGH, LDMEMA, PWEMDE, 23456.938984'
'ZXCVBN, NMDFN, MNMWNS, 34567.903'
'ASDFGH, LDMEMA, PWEMDE, 23444.624'
'QWERTY, PROTMD, PEDKF, 15654.15454'

A reverse sort using:

std::sort (combinedvect.begin(), combinedvect.end(), std::greater<>());

results in:

 'ZXCVBN, NMDFN, MNMWNS, 34567.903'
 'QWERTY, PROTMD, PEDKF, 15654.15454'
 'QWERTY, PROTMD, PEDKF, 12345.15454'
 'ASDFGH, LDMEMA, PWEMDE, 23456.938984'
 'ASDFGH, LDMEMA, PWEMDE, 23444.624'

The reverse sort is used to list duplicates in order from highest numerical substring to lowest, as in:

 'QWERTY, PROTMD, PEDKF, 15654.15454'
 'QWERTY, PROTMD, PEDKF, 12345.15454'

The goal is to remove any entire string whose first 3 substrings match the previous 3 substrings then sort the resulting output in ascending order of the numerical substring.

The output should look like this:

 'QWERTY, PROTMD, PEDKF, 15654.15454'
 'ASDFGH, LDMEMA, PWEMDE, 23456.938984'
 'ZXCVBN, NMDFN, MNMWNS, 34567.903'

Split and map functions, like:

  void splitsort(const std::string &s, double &selectedLPLSQLDate, std::string &mycombinedtext) {
    size_t idx = s.find(',');
    mycombinedtext= s.substr(0, idx+2);
    selectedLPLSQLDate = std::stod(s.substr(idx+3));
}

bool mapFunc(const std::string &a, const std::string &b) {
    double selectedLPLSQLDate1, selectedLPLSQLDate2;
    std::string mycombinedtext1, mycombinedtext2;
    splitsort(a, selectedLPLSQLDate1, mycombinedtext1);
    splitsort(b, selectedLPLSQLDate2, mycombinedtext2);
    return selectedLPLSQLDate1 < selectedLPLSQLDate2;
}

Did not work with

std::sort (combinedvect.begin(), combinedvect.end(), mapFunc);

Alberto · Accepted Answer · 2020-08-10 00:20:17Z

1

This is working as you want:

#include <vector>
#include<string>
#include<algorithm>
#include<map>
#include <iostream>

int main()
{
    std::vector<std::string> v{
            "QWERTY, PROTMD, PEDKF, 12345.15454",
            "ASDFGH, LDMEMA, PWEMDE, 23456.938984",
            "ZXCVBN, NMDFN, MNMWNS, 34567.903",
            "ASDFGH, LDMEMA, PWEMDE, 23444.624",
            "QWERTY, PROTMD, PEDKF, 15654.15454"
    };
    std::map<std::string, double> map;
    for(auto& el: v){
        auto it = el.find_last_of(',');           // find last ","
        auto key = el.substr(0, it);              // extract the key
        auto value = std::stod(el.substr(it+1));  // extract the last value
        if(map.find(key) == map.end() || (map.find(key) != map.end() && map[key] < value)) // if it does not exist already, or if it exists and has a value greater thatn the one inserted
            map[key] = value; // change the value
    }
    for(auto& [k, v]: map)
        std::cout << k << " : " << v << "\n";
}

then you can use std::transform to build an array of std::string joining the map key and value, and this should be nlogn complexity since the insertion in the map is logn and you will do this at most n times

answered Aug 10, 2020 at 0:20

Alberto

13.1k3 gold badges31 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3447273 Over a year ago

This correctly removes duplicates. The double must be changed to std::string if you need to preserve all decimal values. This operation does not sort the resulting output in ascending order of the numerical substring.

Collectives™ on Stack Overflow

C++ Remove string as duplicate in vector based on partial match of substring

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related