4

I have two strings, "hello" an "eo" for instance, and I wish to find duplicate characaters between the two strings, that is: 'e' and 'o' in this example.

My algorithm would go this way

 void find_duplicate(char* str_1, char* str_2, int len1, int len2)
 {
     char c ;

     if(len1 < len2)
     {
        int* idx_1 = new int[len1]; // record elements in little string
        // that are matched in big string
        for(int k = 0 ; k < len1 ; k++)
              idx_1[k] = 0;

        int* idx_2 = new int[len2]; // record if element in str_2 has been 
        // matched already or not
        for(int k = 0 ; k < len2 ; k++)
              idx_2[k] = 0;

        for(int i = 0 ; i < len2 ; i++)
        {     
            c = str_1[i];

            for(int j = 0 ; j < len1 ; j++)
            {
                 if(str_2[j] == c)
                 {
                    if(idx_2[j] == 0) // this element in str_2 has not been matched yet
                    {
                         idx_1[i] = j + 1; // mark ith element in idx as matched in string 2 at pos j
                         idx_2[j] = 1;
                    }
                 }
             }
         }

         // now idx_1 and idx_2 contain matches info, let's remove matches.
         char* str_1_new = new char[len1];
         char* str_2_new = new char[len2];
         int kn = 0;
         for(int k = 0 ; k < len1 ; k++)
         {
            if(idx_1[k] > 0)
            {
                 str_1_new[kn] = str_1[k];
                 kn++;
            }
         }

         kn = 0;
         for(int k = 0 ; k < len2 ; k++)
         {
             if(idx_2[k] > 0)
             {
                 str_2_new[kn] = str_2[k];
                 kn++;
             }
         }
      }
      else
      {
            // same here, switching roles (do it yourself)
       }
  }

i feel my solution is awkward: - symetry of both cases in first if/else and code duplication - time complexity: 2*len1*len2 operations for finding duplicates, then len1 + len2 operations for removal - space complexity: two len1 and two len2 char*.

What if len1 and len2 are not given (with and without resort to STL vector) ?

could you provide your implementation of this algo ?

thanks

1
  • I would sort the characters in the strings and the compare them in one sweep. Commented Oct 6, 2012 at 9:37

2 Answers 2

3

First of all, it isn't substring matching problem - it is problem of finding common characters between two strings.

Your solution works in O(n*m), where n=len1 and m=len2 in your code. You could easily solve the same problem in O(n+m+c) time by counting characters in each of strings (where c is equal to size of character set). This algorithm is called counting sort.

Sample code implementing this in your case:

#include <iostream>
#include <cstring> // for strlen and memset

const int CHARLEN = 256; //number of possible chars

using namespace std;

// returns table of char duplicates
char* find_duplicates(const char* str_1, const char* str_2, const int len1, const int len2)
{
  int *count_1 = new int[CHARLEN];
  int *count_2 = new int[CHARLEN];
  char *duplicates = new char[CHARLEN+1]; // we hold duplicate chars here
  int dupl_len = 0; // length of duplicates table, we insert '\0' at the end
  memset(count_1,0,sizeof(int)*CHARLEN);
  memset(count_2,0,sizeof(int)*CHARLEN);
  for (int i=0; i<len1; ++i)
  {
    ++count_1[str_1[i]];
  }
  for (int i=0; i<len2; ++i)
  {
    ++count_2[str_2[i]];
  }

  for (int i=0; i<CHARLEN; ++i)
  {
    if (count_1[i] > 0 && count_2[i] > 0)
    {
      duplicates[dupl_len] = i;
      ++dupl_len;
    }
  }
  duplicates[dupl_len]='\0';
  delete count_1;
  delete count_2;
  return duplicates;
}

int main()
{
  const char* str_1 = "foobar";
  const char* str_2 = "xro";
  char* dup =   find_duplicates(str_1, str_2, strlen(str_1), strlen(str_2));
  cout << "str_1: \"" << str_1 << "\" str_2: \"" << str_2 << "\"\n";
  cout << "duplicates: \"" << dup << "\"\n";
  delete dup;
  return 0;
}

Please note that I am also sorting the output here. If you do not want to do that, you can skip the counting of characters in second string and just start comparing duplicates on the go.

If you, however, intend to be able to detect multiple duplicates of the same letter (e.g. if "banana" and "arena" should output "aan" instead of "an"), then you can just substract the number of counts in current solution and adjust the output accordingly.

Sign up to request clarification or add additional context in comments.

10 Comments

O(n+m+c), where c is the size of the charset
@Zaroth Note that new int[CHARLEN](); would initialize your array with 0 elements. Also you could simplify a lot your code using std::vector (that are 0-initialized and don't need to be explicitly deleted)
@log0 the () trick is a bit voodooish; since this is an educational answer, I tried to make my code as readable as possible. Also, please note that OP requested a solution without using std::vector.
@KarolyHorvath/Zaroth I think you can just use one count vector and simply add the character any time you notice the count is different to 0. In this case the complexity is indeed O(len1+len2) with memory footprint ~256
@Zaroth I don't think the OP is opposed to the used of st::vector he was wondering if it was changing the complexity.
|
0
std::vector<char> duplicates;
for (auto c1: std::string(str_1))
  for (auto c2: std::string(str_2))
    if (c1 == c2)
      duplicates.push_back(c1);

or if you don't have a c++11 compliant compiler.

std::vector<char> duplicates;
std::string s1(str_1);
std::string s2(str_2);
for (std::size_t i = 0; i < s1.size(); i++)
  for (std::size_t j = 0; j < s2.size(); j++)
    if (s1[i] == s2[j])
      duplicates.push_back(s1[i]);

-- based on Zaroth answer

std::vector<int> count(256,0);
for (auto c : std::string(str_1))
 count[c] += 1;
for (auto c : std::string(str_2))
 if (count[c] > 0)
   duplicates.push_back(c);

9 Comments

for (auto c1: std::string(str_1)), this is C++11, right? what if iam with C++ for vs2010
does this mean that at runtime std::size_t is "known to be" size of a character in string?
@fonjibe vs2010 support the auto keyword blogs.msdn.com/b/vcblog/archive/2010/04/06/…
You have i++ instead of j++ in the inner loop.
@fonjibe for (auto c1: std::string(str_1)) means: for every element c1 in a temporary std::string constructed with str_1.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.