Is there a simple way in C# to compare two strings and find out the percentage of similarity between the two? Say you have a string "I like Bing" and "I like Google" it would compare the words "I" "Like" "Bing" with the words "I" "Like" "Google" then would say that 2/3 of it was the same, and would return .66
-
do you want to do string alignment, or just compare one by one?Can Gencer– Can Gencer2011-03-20 22:09:57 +00:00Commented Mar 20, 2011 at 22:09
-
1What's the definition of the similarity you are looking for?Jonas Elfström– Jonas Elfström2011-03-20 22:10:24 +00:00Commented Mar 20, 2011 at 22:10
-
1What kind of similarity? Are you looking for character-to-character or patterns like "my name is marlon" and "my brother is marlon". Both will yield different results.Marlon– Marlon2011-03-20 22:11:12 +00:00Commented Mar 20, 2011 at 22:11
-
Your description of the problem is still a bit vague. What about case sensitivity? Punctuation? What if a word appears twice in one and once in the other?Can Gencer– Can Gencer2011-03-20 22:22:57 +00:00Commented Mar 20, 2011 at 22:22
Add a comment
|
2 Answers
The Damerau–Levenshtein distance is probably the most common implementation I've seen. Should be simple enough to implement in C# given the samples on the Wikipedia page.
5 Comments
Can Gencer
blogs.msdn.com/b/toub/archive/2006/05/05/590814.aspx here is some code for it as well.
David
And it's Jonas FTW! Awesome link :)
Can Gencer
en.wikipedia.org/wiki/Needleman-Wunsch_algorithm This might be more relevant if one wants to do sequence alignment though..
David
I'm going to have to favorite this question just for the excellent assortment of links being provided here. These may easily prove useful in some of my employer's projects.
A couple of approaches you might check out are Levenshtein Distance and a Soundex Algorithm.