1

Before marking this as duplicate, please read the details here.

Example 1:

String A: The seven habits of highly effective people.

String B: "This is a sample text. There is only one product in it. It is a book. The book is The seven habits of highly effective people."

Example 2:

String A: The seven habits of highly effective people.

String B: "This is a sample text. There is only one product in it. It is a book. The book is The seven habits of highly effective peopl."

Now solving the above examples with a code like
B.Contains(A)
will give the correct results. However the same code will return "false" as output in Example 2.

How do I resolve this problem?

There is an "e" missing in example 2 and I am aware about it and that's the problem. How do I compare one string with another where string A is nearly identical with a "part of string B"?

11
  • 4
    example 2 doesnt contain A Commented Sep 11, 2013 at 8:24
  • 1
    @user1039119 - Same code returns "false" as output in Example 2,as the complete string is not there. what you want to achieve ? Commented Sep 11, 2013 at 8:24
  • 1
    At the end of string B in ex.2, you have peopl. not people. Commented Sep 11, 2013 at 8:25
  • 1
    The strings in example 2 are obviously different - if you want to get matches for "nearly indentical" strings it gets difficult very fast, simply because defining "nearly identical" is fun. Commented Sep 11, 2013 at 8:25
  • 2
    What you're looking for is measuring how similar two strings are, then setting some threshold for how similar "similar enough" is. This question looks like a good place to start Commented Sep 11, 2013 at 8:29

4 Answers 4

2

As stated in my comment.. the Levenshtein Distance algorithm (and similar ones) compute differences between strings and return a numerical result (wiki: http://en.m.wikipedia.org/wiki/Levenshtein_distance).

However, I would definitely apply benchmarking and caching strategies for these algorithms. They are decent with small input.. but when I have implemented it I have had to make sure I cache results / lookups. Your large example will not perform "fast".. depending on what "fast" is for your use case.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi Simon, thanks for your answer. I have gone through the details about the algorithm you mentioned. However, the problem I am facing is not about comparing two nearly identical strings. The Problem is comparing one string with "part of another string" which is nearly identical. As shown in the example I mentioned, only a part of String B matches string A. hence the distance algorithm may not give accurate results.
@user1039119 you can calculate it, for example, bool contains = Math.Abs(Math.Abs(strB.Length - strA.Length) - levenshteinDistance) < 3;
1

You can use string.compare, Find below few examples which may help you.

string a = "a"; 
string b = "b"; 
int c;

c = string.Compare(a, b);
Console.WriteLine(c);

c = string.CompareOrdinal(b, a);
Console.WriteLine(c);

c = a.CompareTo(b);
Console.WriteLine(c);

c = b.CompareTo(a);
Console.WriteLine(c);

Comments

0

What you are looking for looks like a search engine with score rate.

I used the Levenshtein Distance methode to search/compare string that looks like the same but who are not.

there is an example at the following link :

http://www.dotnetperls.com/levenshtein

Comments

0

I am answering my own question.

I was looking for a solution to compare one string with another where string A is nearly identical with a "part of string B".

This is how I resolved the issue.

  1. I applied the "Longest Common Substring" algorithm and founded the longest common substring between the two strings.

  2. Then I used "Levenshtein Distance algorithm" to compare my String A with the "Longest Common Substring" found from step 1.

  3. If the result available from the algorithm mentioned in step 2 is above certain threshold, then it implies that the string A exists in String B.

  4. Problem Solved.

I have worked on the problem for one day and I have found decent results for the problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.