3

I am trying to 'obfuscate' data in a SQL database for the purposes of testing. I have a single field in a single table where I want the values to be replaced with a random string - however the same string has to have the same value. So for example:

Cat
Dog
Cat
Fish
Monkey

Would have to replaced with

YuW -- same
JiK
YuW -- same
IPoQ
KYiLwY

I don't want this to be reversible (so no ROT13, etc..)

EDIT: i need for it to remain the same length as well. This database will be used for performance testing purposes, and I want realistic string sizes to be used.

2
  • What are you using to generate the random strings? Commented Jul 15, 2015 at 14:45
  • base64 is reversible, so nope :( Commented Jul 15, 2015 at 14:54

5 Answers 5

3

You should use hashing:

SELECT HashBytes('MD5', yourcolumnname)

This will give you a non-reversible 'obfuscation' for which the same input value will return the same value.

Edit: if you don't want MD5 HashBytes can handle MD2, MD4, MD5, SHA, SHA1, or SHA2.

Edit 2: to keep the same length (at least up to the length of the hash value) do:

SELECT (SELECT SUBSTRING(HASHBYTES('MD5',[yourcolumnname]),0,len([yourcolumnname])))
Sign up to request clarification or add additional context in comments.

2 Comments

don't use MD5 if you want the string to remain secret ^^
Sorry, updated my question. I already tried this, but then all string sizes are the same, and the data size becomes all the same, which affects performance tests.
2

If it's just for testing purpose and a matter of interest, I'd do it like that:

  1. Put distinct records into temporary table and add a new column, let's name it [Randomized]
  2. Generate desired random text and make sure it has same LEN() as actual text (Use LEFT(), RIGHT(), SUBSTRING() or any other function to do that)
  3. Query your actual table and join them on your predicate.
  4. Update your actual table with randomized columns

Not sure if it fits your needs or not.

1 Comment

Thanks. For now I went with the top comment, it works for my needs. However, I am going to work to generalize this for all of our test databases and I think your mechanism is going to be the best. I just need to code it up and test it.
1

Here is one method: Do a checksum() on the string and choose the first characters.

select left(cast(checksum(name) as varchar(255)), 10)

The result will only be strings with digits, but that seems to meet your requirement.

2 Comments

This is really close to what I want, and so far looks like the best choice. I really wish I could keep letters as letters though.
@esac you could always replace number for letters in a second step. Making it more irreversible because now you dont know if a B is a B or a 0
1

You can use a hashing function to replace the values. You may want to keep a key so that you can reverse the the functions. That being said it sounds like you simply want to obfuscate. You can do this using any number of hash functions in most sql languages.You could consider md5, sha1, or sha2 or some other.

In sql-server hashbytes has MD2, MD4, MD5, SHA, SHA1, or SHA2 I believe.

You can obfuscate your data like this:

select HASHBYTES('MD5', 'Sample String to hash ') from x;
select HASHBYTES('SHA1', 'Sample String to hash ') from x;

These algorithms are designed to reduce collisions. Md5 is much less secure.

I would recommend CRC32 which returns the cyclic redundancy check value of a given string as a 32-bit unsigned value when your data isn't sensitive. It can be used as a hash function but again is not secure. It does give a smaller string so is more efficient when reversebility is not an issue.

select CHECKSUM('string') from x;

Comments

0

I was curious... What about this:

EDIT: You could use this to find a "new value" for each distinct existing value...

CREATE VIEW Get_NewID
AS
SELECT NEWID() AS MyNewID
GO

CREATE FUNCTION dbo.RandomLetters(@Length INT)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @rslt VARCHAR(MAX);
WITH TwentySixNumbers AS
(
    --a tally table
    SELECT TOP 26 ROW_NUMBER() OVER(ORDER BY object_id) AS nmbr
                 ,(SELECT MyNewID FROM Get_NewID) AS sort
    FROM sys.objects
)
,TwentySixLetters AS
(
    SELECT nmbr,sort,CHAR(nmbr+64) AS letter
    FROM TwentySixNumbers
)
SELECT @rslt=
(
    SELECT TOP (@Length) letter
    FROM TwentySixLetters
    ORDER BY sort
    FOR XML PATH(''),TYPE
).value('.','varchar(max)');
RETURN @rslt;
END
GO

--Here you create 10 different strings of seven letters
--pass as length the length of your text
WITH TenNumbers AS
(
    --a tally table
    SELECT TOP 10 ROW_NUMBER() OVER(ORDER BY object_id) AS nmbr
    FROM sys.objects
)
SELECT dbo.RandomLetters(7)
FROM TenNumbers;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.