1

This was my first attempt at a SQL function. I wrote it in VB and it works like a charm. When I translated it to SQL Server, it returns not what I expect. What the function is intended to do is to return a percentage match of two strings.

How I expected it to function is this:

  1. accept two strings, compare the two, and based on my rating values, return a percentage of the match value...matchscore/max possiblescore
  2. The length of the larger string is multiplied by 3. This is the max possiblescore.
  3. Go character by character of the first string and find that character in the second string.
  4. If the character is found in the same position in the second string, add three to the matchscore and move on to the next letter.
  5. If the character is found in the second word, but not in the same position, add one to match score and move on to the next character.
  6. If the character is not found in the second string, add nothing and move on to the next character.
  7. Divide the matchscore by the max possiblescore. This returns a decimal value. I read RETURN only returns an integer, so I multiplied the division result by 100.

An example of what I expected is I compare "CAT" to "CART". My expected return is 7/12...0.58. Instead, I get 0. If I compare "CAT" to "CAT", I expect 9/9...1.00. Instead, I get 2.

(Note from 9/17/2014: I appreciate your input. I used what you suggested, and did one more major change, that doesn't affect what I asked about, other than getting the correct final answer is that I got rid of the second While Loop. Instead, I search for @strLetter in @strWord2. If it is found, then, I look to see if it is in the same position in @strWord2 as @strWord1. If it is, then I add 3, if not, I add 1. This sped up the function and made the count accurate.

Here is the code:

CREATE FUNCTION [dbo].[CompareWords]
    (@strWord1 VARCHAR(2000), @strWord2 VARCHAR(2000))
RETURNS DECIMAL
AS
BEGIN
   SET @strWord1 = UPPER(@strWord1)
   SET @strWord2 = UPPER(@strWord2)

   DECLARE @intLength INT

   IF LEN(@strWord1) >= LEN(@strWord2)
   BEGIN
        SET @intLength = LEN(@strWord1)
   END
   ELSE
   BEGIN
        SET @intLength = LEN(@strWord2)
   END

   DECLARE @iWordLoop1 INT
   DECLARE @iWordLoop2 INT
   DECLARE @intWordLoop2 INT
   DECLARE @intWordScore INT
   DECLARE @intLetterScore INT

   SET @intWordScore = 0
   SET @intWordLoop2 = Len(@strWord2)

   DECLARE @strLetter VARCHAR(1000)

   DECLARE @count1 INT 
   SET @count1 = 0 

   SET @iWordLoop1 = Len(@strWord1)

   WHILE (@count1 < @iWordLoop1) 
   BEGIN 
       SET @strLetter = SUBSTRING(@strWord1, @count1+1, 1)
       SET @intLetterScore = 0

       DECLARE @count2 INT 
       SET @count2 = 0 

       SET @iWordLoop2 = Len(@strWord2)

       WHILE (@count2 < @iWordLoop2) 
       BEGIN
           If @strLetter = SUBSTRING(@strWord2, @count2+1, 1) 
           BEGIN
                If @iWordLoop1 = @iWordLoop2 
                BEGIN
                        SET @intLetterScore = 3
                        SET @iWordLoop2 = Len(@strWord2)
                END
                ELSE
                BEGIN
                        SET @intLetterScore = 1
                END
           END

           SET @intWordScore = @intWordScore + @intLetterScore
           SET @count2 = (@count2 + 1) 
        END

       SET @count1 = (@count1 + 1) 
   END 

   DECLARE @sinScore DEC
   SET @sinScore = (@intWordScore / (3 * @intLength)) * 100

   RETURN @sinSCore
END;
6
  • Why are you trying to do this in sql? Commented Sep 15, 2014 at 23:04
  • 2
    Integer division means 7/12 evaluates to 0. Not sure about your other issue without running it. Commented Sep 15, 2014 at 23:05
  • I am doing this in SQL, because we are comparing addresses from two tables to rate which ones are matches. WHen we don't get the exact match...which won't happen due to different field structures...one has just the street address, the other includes the city, state, etc...I take the address with the greatest match level. Commented Sep 15, 2014 at 23:09
  • Martin, that resolved a major part. Thank you for that insight. Now, I multiply @intWordScore by 100 before dividing by (3*@intLength). CAT to CART returns 66, and CAT to CAT returns 250. I suspect it is do to not getting out of the loop if the character is found in the right position in string 2, so, it then adds one for finding the character again...adding 3 for the first find and 1 for the second find. Commented Sep 15, 2014 at 23:16
  • Did you know: you can debug MS SQL user defined functions? If you knew you wouldn't ask this question Commented Sep 15, 2014 at 23:47

1 Answer 1

1

The most significant changes I made were to

  1. reset the intLetterScore to 0 after it's been used in the intWordScore calculation. Without it being reset, the same value was being used each time the inner loop and the character was not matched.
  2. move the multiplication by 100 into the brackets in the calculation of sinScore.
    As referred to in a previous post, because you are doing integer multiplication, the decimal portion is truncated from the calculation. By growing the wordScore by a factor of 100, it is much more likely to be larger than the length and yield a result which is non-zero.

Multiplying outside the brackets has the multiplies the integer result of the division score by length. If this answer is already zero, then the multiplication result is also zero.

Other changes I made are commented in the code: the variable intWordLoop2 has no effect on the calculation and can be removed; strLetter can be declared as a Char(1) instead of VarChar(1000).

CREATE FUNCTION [dbo].[CompareWords]
    (@strWord1 VARCHAR(2000), @strWord2 VARCHAR(2000))
RETURNS DECIMAL
AS
BEGIN
   SET @strWord1 = UPPER(@strWord1)
   SET @strWord2 = UPPER(@strWord2)

   --Set @intLength (maxLength as len of word1 or word2)
DECLARE @intLength INT --maxLength
IF LEN(@strWord1) >= LEN(@strWord2)
BEGIN
    SET @intLength = LEN(@strWord1)
END
ELSE
BEGIN
    SET @intLength = LEN(@strWord2)
END

   DECLARE @iWordLoop1 INT, @iWordLoop2 INT--, @intWordLoop2 INT --This variable doesn't impact the calculation 
   DECLARE @intWordScore INT
   DECLARE @intLetterScore INT

   SET @intWordScore = 0
   --SET @intWordLoop2 = Len(@strWord2)--this value is not used anywhere else, so removing makes no difference.

   --DECLARE @strLetter VARCHAR(1000)
   DECLARE @strLetter CHAR(1)--there is no need for 1000 characters since we're only ever assigning a single character to this

   DECLARE @count1 INT 
   SET @count1 = 0 
   SET @iWordLoop1 = Len(@strWord1)

   WHILE (@count1 < @iWordLoop1) 
   BEGIN 
       SET @strLetter = SUBSTRING(@strWord1, @count1+1, 1)
       SET @intLetterScore = 0
       DECLARE @count2 INT 
       SET @count2 = 0 
       SET @iWordLoop2 = Len(@strWord2)

       WHILE (@count2 < @iWordLoop2) 
       BEGIN
           If @strLetter = SUBSTRING(@strWord2, @count2+1, 1) 
           BEGIN
                If @iWordLoop1 = @iWordLoop2 
                BEGIN
                        SET @intLetterScore = 3
                        SET @iWordLoop2 = Len(@strWord2)
                END
                ELSE
                BEGIN
                        SET @intLetterScore = 1
                END
           END

           SET @intWordScore = @intWordScore + @intLetterScore
           SET @intLetterScore = 0
           SET @count2 = (@count2 + 1) 
        END
       SET @count1 = (@count1 + 1) 
   END 

   DECLARE @sinScore DEC
   SET @sinScore = (@intWordScore*100 / (3 * @intLength))

   RETURN @sinSCore
END;


select dbo.comparewords ('Cat','cart')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.