2

Given a string S and a set of n substrings. Remove every instance of those n substrings from S so that S is of the minimum length and output this minimum length.

Example 1

S = ccdaabcdbb
n = 2
substrings = ab, cd

Output

2

Explanation:

ccdaabcdbb -> ccdacdbb -> cabb -> cb (length=2)

Example 2

S = abcd
n = 2
substrings = ab,bcd

Output

 1

How do I solve this problem ?

2
  • 1
    What's the scale of the problem (length of string, the number n of substrings)? BFS will solve it, but might not be efficient enough for large strings Commented Jun 17, 2014 at 19:11
  • @amit I guess BFS would take O(|S|^2 * |SET|) Commented Jun 17, 2014 at 19:14

4 Answers 4

3

A simple Brute-force search algorithm is:

  • For each substring, try all possible ways to remove it from the string, then recurse.

In Pseudocode:

def min_final_length (input, substrings):
    best = len(input)
    for substr in substrings:
        beg = 0
        // find all occurrences of substr in input and recurse
        while (found = find_substring(input, substr, from=beg)):
            input_without_substr = input[0:found]+input[found+len(substr):len(input)]
            best = min(best, min_final_length(input_without_substr,substrings))
            beg = found+1
    return best

Let complexity be F(S,n,l) where S is the length of the input string, n is the cardinality of the set substrings and l is the "characteristic length" of substrings. Then

F(S,n,l) ~ n * ( S * l + F(S-l,n,l) )

Looks like it is at most O(S^2*n*l).

Sign up to request clarification or add additional context in comments.

4 Comments

Hi,c an you elaborate on the algorithm a little ?
should be beg += 1. For example, consider case where input = "ababacc" and substrings = ["aba", "abc"]. the original solution would yield ababacc -> bacc but the correct solution should be ababacc -> abcc -> c.
What will be the worst case time complexity of this algorithm ?
@UdbhavKalra: Looks like it is at most O(S^2*n*l).
1

The following solution would have an complexity of O(m * n) where m = len(S) and n is the number of substring

def foo(S, sub):
    i = 0
    while i < len(S):
        for e in sub:
            if S[i:].startswith(e):
                S = S[:i] + S[i+len(e):]
                i -= 1
                break
        else: i += 1
    return S, i

Comments

0

If you are for raw performance and your string is very large, you can do better than brute force. Use a suffix trie (E.g, Ukkonnen trie) to store your string. Then find each substring (which us done in O(m) time, m being substring length), and store the offsets to the substrings and length in an array. Then use the offsets and length info to actually remove the substrings by filling these areas with \0 (in C) or another placeholder character. By counting all non-Null characters you will get the minimal length of the string.

This will als handle overlapping substring, e.g. say your string is "abcd", and you have two substrings "ab" and "abcd".

Comments

0

I solved it using trie+dp.
First insert your substrings in a trie. Then define the state of the dp is some string, walk through that string and consider each i (for i =0 .. s.length()) as the start of some substring. let j=i and increment j as long as you have a suffix in the trie (which will definitely land you to at least one substring and may be more if you have common suffix between some substring, for example "abce" and "abdd"), whenever you encounter an end of some substring, go solve the new sub-problem and find the minimum between all substring reductions.

Here is my code for it. Don't worry about the length of the code. Just read the solve function and forget about the path, I included it to print the string formed.

struct node{
    node* c[26];
    bool str_end;
    node(){
        for(int i= 0;i<26;i++){
            c[i]=NULL;
        }
        str_end= false;
    }
};
class Trie{
public:
    node* root;
    Trie(){
        root = new node();
    }
    ~Trie(){
        delete root;
    }
};
class Solution{
public:
    typedef pair<int,int>ii;
    string get_str(string& s,map<string,ii>&path){
        if(!path.count(s)){
            return s;
        }
        int i= path[s].first;
        int j= path[s].second;
        string new_str =(s.substr(0,i)+s.substr(j+1));
        return get_str(new_str,path);
    }
    int solve(string& s,Trie* &t, map<string,int>&dp,map<string,ii>&path){
        if(dp.count(s)){
            return dp[s];
        }
        int mn= (int)s.length();
        for(int i =0;i<s.length();i++){
            string left = s.substr(0,i);
            node* cur = t->root->c[s[i]-97];
            int j=i;
            while(j<s.length()&&cur!=NULL){
                if(cur->str_end){
                    string new_str =left+s.substr(j+1);
                    int ret= solve(new_str,t,dp,path);
                    if(ret<mn){
                        path[s]={i,j};
                    }
                }
                cur = cur->c[s[++j]-97];
            }
        }
        return dp[s]=mn;
    }
    string removeSubstrings(vector<string>& substrs, string s){
        map<string,ii>path;
        map<string,int>dp;
        Trie*t = new Trie();
        for(int i =0;i<substrs.size();i++){
            node* cur = t->root;
            for(int j=0;j<substrs[i].length();j++){
                if(cur->c[substrs[i][j]-97]==NULL){
                    cur->c[substrs[i][j]-97]= new node();
                }
                cur = cur->c[substrs[i][j]-97];
                if(j==substrs[i].length()-1){
                    cur->str_end= true;
                }
            }
        }
        solve(s,t,dp,path);
        return get_str(s, path);
    }
};

int main(){
    vector<string>substrs;
    substrs.push_back("ab");
    substrs.push_back("cd");
    Solution s;
    cout << s.removeSubstrings(substrs,"ccdaabcdbb")<<endl;
    return 0;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.