-1

With help of Zohar Answer, I got SQL function to generate random string but I am facing the problem with duplicate.

Query

Create FUNCTION [dbo].[MaskGenerator]
(    
    @Prefix nvarchar(4000), -- use null or an empty string for no prefix    
    @suffix nvarchar(4000), -- use null or an empty string for no suffix    
    @MinLength int, -- the minimum length of the random part    
    @MaxLength int, -- the maximum length of the random part    
    @Count int, -- the maximum number of rows to return. Note: up to 1,000,000 rows           
    @CharType tinyint -- 1, 2 and 4 stands for lower-case, upper-case and digits. 
                      -- a bitwise combination of these values can be used to generate all possible combinations: 
                      -- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
)
RETURNS TABLE
AS 
RETURN 

-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000 

SELECT TOP(@Count)  N As Number, 
        CONCAT(@Prefix, (
        SELECT  TOP (Length) 
                -- choose what char combination to use for the random part
                CASE @CharType 
                    WHEN 1 THEN LOWER
                    WHEN 2 THEN UPPER
                    WHEN 3 THEN IIF(Rnd % 2 = 0, LOWER, UPPER)
                    WHEN 4 THEN Digit
                    WHEN 5 THEN IIF(Rnd % 2 = 0, LOWER, Digit)
                    WHEN 6 THEN IIF(Rnd % 2 = 0, UPPER, Digit)
                    WHEN 7 THEN 
                        CASE Rnd % 3
                            WHEN 0 THEN LOWER
                            WHEN 1 THEN UPPER
                            ELSE Digit
                        END
                END
        FROM Tally As T0  
        -- create a random number from the guid using the GuidGenerator view
        CROSS APPLY (SELECT ABS(CHECKSUM(NewGuid)) As Rnd FROM GuidGenerator) AS RAND
        CROSS APPLY
        (
            -- generate a random lower-case char, upper-case char and digit
            SELECT  CHAR(97 + Rnd % 26) As LOWER, -- Random lower case letter
                    CHAR(65 + Rnd % 26) As UPPER,-- Random upper case letter
                    CHAR(48 + Rnd % 10) As Digit -- Random digit
        ) AS Chars
        WHERE  T0.N <> -T1.N -- Needed for the subquery to get re-evaluated for each row
        FOR XML PATH('') 
        ), @Suffix) As RandomString
FROM Tally As T1 
CROSS APPLY
(
    -- Select a random length between @MinLength and @MaxLength (inclusive)
    SELECT TOP 1 N As Length
    FROM Tally As T2
    CROSS JOIN GuidGenerator 
    WHERE T2.N >= @MinLength
    AND T2.N <= @MaxLength
    AND T2.N <> t1.N
    ORDER BY NewGuid
) As Lengths;

Above function will provide the random string based on its parameter. For example below query will generate 100 random strings with formation of Test_Product_. the result sets having duplicate values which needs to be ignore. I have tried applying row_number but its slow down the query performance also requesting count is not coming.

SELECT * FROM dbo.MaskGenerator('Test_Product_',null,1,4,100,4) ORDER BY 2

I have made fiddle demo here : SQL Fiddle and my attempt also here

4
  • 1
    Why not just use newid()? Commented Dec 11, 2019 at 11:31
  • @GordonLinoff because the OP wants to control the format of the strings generated - length, allowed chars, prefix and suffix. Commented Dec 11, 2019 at 12:07
  • If you use newid() in such a function you get an error about the use of a side-effecting operator. I guess MS expects such functions to be deterministic. The view with the newid is basically a hack to get around that. Commented Dec 11, 2019 at 15:48
  • @LukStorms I don't think it's about determinism of the function, but rather about the side effects of the NewId() function - but yes - the view is simply a workaround. Commented Dec 12, 2019 at 6:47

1 Answer 1

1

Basically, this is an effect of the birthday problem.
The best solution I can offer as of now is to generate twice as many random strings you need, then select top 100 distinct values from them:

SELECT TOP 100 RandomString, ROW_NUMBER() OVER(ORDER BY @@SPID) As Number
FROM 
(
  SELECT DISTINCT RandomString 
  FROM dbo.MaskGenerator('Test_Product_',null,1,4,200,4)
) As Rnd
ORDER BY RandomString

This might seem like a waist since you're generating twice as many random strings as you need, However:

  1. I'm not sure that's actually the case. The query optimizer might just stop execution once you have 100 distinct values.

  2. Performance tests I've done for this function (on a relatively strong SQL Server 2016) shows it is lightning-fast, at least with a small number of strings:

    • Generating 200 strings average around 23 milliseconds.
    • Generating 2000 strings average around 55 milliseconds.
    • Generating 100,000 strings average around 2.8 seconds.

Generating 1 million strings, however, average around 30 seconds.

Sign up to request clarification or add additional context in comments.

2 Comments

I have one more pattern (\d){1-5}.(\d){2} and which should give 123.76, 45.69, 9563.50. Can you please help me with this case.
In this case, perhaps it's better to just revert to the new and improved random string generator function and handle the non-random part of the string outside the function. That would make a slightly more cumbersome code but it will give you a much more flexible solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.