0

Is there an efficient way to remove a string as a duplicate based on a match of a substring?

Consider a vector (combinedvect) containing a number of CSV strings of the same form:

'QWERTY, PROTMD, PEDKF, 12345.15454'
'ASDFGH, LDMEMA, PWEMDE, 23456.938984'
'ZXCVBN, NMDFN, MNMWNS, 34567.903'
'ASDFGH, LDMEMA, PWEMDE, 23444.624'
'QWERTY, PROTMD, PEDKF, 15654.15454'

A reverse sort using:

std::sort (combinedvect.begin(), combinedvect.end(), std::greater<>());

results in:

 'ZXCVBN, NMDFN, MNMWNS, 34567.903'
 'QWERTY, PROTMD, PEDKF, 15654.15454'
 'QWERTY, PROTMD, PEDKF, 12345.15454'
 'ASDFGH, LDMEMA, PWEMDE, 23456.938984'
 'ASDFGH, LDMEMA, PWEMDE, 23444.624'

The reverse sort is used to list duplicates in order from highest numerical substring to lowest, as in:

 'QWERTY, PROTMD, PEDKF, 15654.15454'
 'QWERTY, PROTMD, PEDKF, 12345.15454'

The goal is to remove any entire string whose first 3 substrings match the previous 3 substrings then sort the resulting output in ascending order of the numerical substring.

The output should look like this:

 'QWERTY, PROTMD, PEDKF, 15654.15454'
 'ASDFGH, LDMEMA, PWEMDE, 23456.938984'
 'ZXCVBN, NMDFN, MNMWNS, 34567.903'

Split and map functions, like:

  void splitsort(const std::string &s, double &selectedLPLSQLDate, std::string &mycombinedtext) {
    size_t idx = s.find(',');
    mycombinedtext= s.substr(0, idx+2);
    selectedLPLSQLDate = std::stod(s.substr(idx+3));
}

bool mapFunc(const std::string &a, const std::string &b) {
    double selectedLPLSQLDate1, selectedLPLSQLDate2;
    std::string mycombinedtext1, mycombinedtext2;
    splitsort(a, selectedLPLSQLDate1, mycombinedtext1);
    splitsort(b, selectedLPLSQLDate2, mycombinedtext2);
    return selectedLPLSQLDate1 < selectedLPLSQLDate2;
}

Did not work with

std::sort (combinedvect.begin(), combinedvect.end(), mapFunc);

1 Answer 1

1

This is working as you want:

#include <vector>
#include<string>
#include<algorithm>
#include<map>
#include <iostream>

int main()
{
    std::vector<std::string> v{
            "QWERTY, PROTMD, PEDKF, 12345.15454",
            "ASDFGH, LDMEMA, PWEMDE, 23456.938984",
            "ZXCVBN, NMDFN, MNMWNS, 34567.903",
            "ASDFGH, LDMEMA, PWEMDE, 23444.624",
            "QWERTY, PROTMD, PEDKF, 15654.15454"
    };
    std::map<std::string, double> map;
    for(auto& el: v){
        auto it = el.find_last_of(',');           // find last ","
        auto key = el.substr(0, it);              // extract the key
        auto value = std::stod(el.substr(it+1));  // extract the last value
        if(map.find(key) == map.end() || (map.find(key) != map.end() && map[key] < value)) // if it does not exist already, or if it exists and has a value greater thatn the one inserted
            map[key] = value; // change the value
    }
    for(auto& [k, v]: map)
        std::cout << k << " : " << v << "\n";
}

then you can use std::transform to build an array of std::string joining the map key and value, and this should be nlogn complexity since the insertion in the map is logn and you will do this at most n times

Sign up to request clarification or add additional context in comments.

1 Comment

This correctly removes duplicates. The double must be changed to std::string if you need to preserve all decimal values. This operation does not sort the resulting output in ascending order of the numerical substring.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.