Random record from a database table (T-SQL)

Question

Is there a succinct way to retrieve a random record from a sql server table?

I would like to randomize my unit test data, so am looking for a simple way to select a random id from a table. In English, the select would be "Select one id from the table where the id is a random number between the lowest id in the table and the highest id in the table."

I can't figure out a way to do it without have to run the query, test for a null value, then re-run if null.

Ideas?

theres a couple of methods here brettb.com/SQL_Help_Random_Numbers.asp — Mesh
– Mesh, Commented Oct 10, 2008 at 13:49
Are you sure you want to take this approach? Unit test data should not be random - in fact, you should be guaranteed to get the same results no matter how many times you execute the unit test. Having random data might violate this fundamental principle of unit testing. — rein
– rein, Commented May 8, 2009 at 10:27

Sklivvz · Accepted Answer · 2012-04-12 11:11:25Z

183

Is there a succinct way to retrieve a random record from a sql server table?

Yes

SELECT TOP 1 * FROM table ORDER BY NEWID()

Explanation

A NEWID() is generated for each row and the table is then sorted by it. The first record is returned (i.e. the record with the "lowest" GUID).

Notes

GUIDs are generated as pseudo-random numbers since version four:
The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers.

The algorithm is as follows:
- Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.
- Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number from Section 4.1.3.
- Set all the other bits to randomly (or pseudo-randomly) chosen values.
—A Universally Unique IDentifier (UUID) URN Namespace - RFC 4122
The alternative SELECT TOP 1 * FROM table ORDER BY RAND() will not work as one would think. RAND() returns one single value per query, thus all rows will share the same value.
While GUID values are pseudo-random, you will need a better PRNG for the more demanding applications.
Typical performance is less than 10 seconds for around 1,000,000 rows — of course depending on the system. Note that it's impossible to hit an index, thus performance will be relatively limited.

edited Apr 12, 2012 at 11:11

answered Oct 10, 2008 at 13:46

Sklivvz

31.3k24 gold badges123 silver badges174 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Jeremy Over a year ago

Exactly what I was looking for. I had a feeling it was simpler than I was making it.

Skizz Over a year ago

You are assuming that NEWID produces pseudorandom values. There is a good chance it will produced sequential values. NEWID just produces unique values. RAND, however, produces pseudo random values.

Tom Ritter Over a year ago

I'm running it on a heavily indexed table with 1,671,145 rows, and it takes 7 seconds to return. The table is pretty optimal too - it's virtually the heart of our database so it's taken care of.

Sklivvz Over a year ago

@ÂviewAnew. 1.6 million rows and 7 secs on a select that doesn't (and can't) hit an index is not bad.

Sklivvz Over a year ago

@Skizz, rand does not work like that. A SINGLE random value is generated before the SELECT. So if you try "SELECT TOP 10 RAND()... " you always get the same value

|

Martin Smith · Accepted Answer · 2012-08-26 10:24:27Z

30

On larger tables you can also use TABLESAMPLE for this to avoid scanning the whole table.

SELECT  TOP 1 *
FROM YourTable
TABLESAMPLE (1000 ROWS)
ORDER BY NEWID()

The ORDER BY NEWID is still required to avoid just returning rows that appear first on the data page.

The number to use needs to be chosen carefully for the size and definition of table and you might consider retry logic if no row is returned. The maths behind this and why the technique is not suited to small tables is discussed here

answered Aug 26, 2012 at 10:24

Martin Smith

457k97 gold badges777 silver badges887 bronze badges

3 Comments

Mark Entingh Over a year ago

I found this on Microsoft's website: You can use TABLESAMPLE to quickly return a sample from a large table when either of the following conditions is true: The sample does not have to be a truly random sample at the level of individual rows. Rows on individual pages of the table are not correlated with other rows on the same page.

Martin Smith Over a year ago

@MarkEntingh - In the case of TOP 1 it doesn't matter if rows on the same page are correlated or not. You're only picking one of them.

PerPlexSystem Over a year ago

Could this be used to choose say the TOP @X (50 or set before) and where TABLESAMPLE (@Rows ROWS) based on a Count from a table?

Neil N · Accepted Answer · 2011-01-20 05:31:41Z

10

Also try your method to get a random Id between MIN(Id) and MAX(Id) and then

SELECT TOP 1 * FROM table WHERE Id >= @yourrandomid

It will always get you one row.

edited Jan 20, 2011 at 5:31

Neil N

25.3k17 gold badges87 silver badges148 bronze badges

answered Oct 10, 2008 at 14:13

Sklivvz

31.3k24 gold badges123 silver badges174 bronze badges

3 Comments

Neil N Over a year ago

-1, This would only work when there are no missing ID's between min and max. If one is deleted then that same ID is generated by the random function, you will get zero records back.

Sklivvz Over a year ago

@Neil, not really - it will get you the first row with an Id greater than the random number if there are missing Ids. The problem here is that the probability of each row coming out is not constant. But then again this suffices in most cases.

TomTom Over a year ago

+1. For unit testing that should hit different values that is good enough - if you requie a real random, then this is something else. But in the OP context it should be good enough.

displayName · Accepted Answer · 2015-10-10 20:31:38Z

7

If you want to select large data the best way that I know is:

SELECT * FROM Table1
WHERE (ABS(CAST(
    (BINARY_CHECKSUM
    (keycol1, NEWID())) as int))
    % 100) < 10

Source: MSDN

edited Oct 10, 2015 at 20:31

displayName

14.4k9 gold badges65 silver badges77 bronze badges

answered Dec 16, 2013 at 8:33

hmfarimani

5511 gold badge9 silver badges15 bronze badges

3 Comments

QMaster Over a year ago

I'm not sure but I think using the RAND() rather NEWID() to generate truly random numbers may be better because of disadvantages of using NEWID() in select process.

QMaster Over a year ago

I try using this method with exact number of records rather percent base, I did it with expand select range and limiting with TOP n, is there any suggestion?

QMaster Over a year ago

I found another problem with this scenario, If you use group by you will get the same order of randomly selected rows always, so it seems in small tables the @skilvvz approach is most proper.

likeitlikeit · Accepted Answer · 2013-09-17 19:17:37Z

I was looking to improve on the methods I had tried and came across this post. I realize it's old but this method is not listed. I am creating and applying test data; this shows the method for "address" in a SP called with @st (two char state)

Create Table ##TmpAddress (id Int Identity(1,1), street VarChar(50), city VarChar(50), st VarChar(2), zip VarChar(5))
Insert Into ##TmpAddress(street, city, st, zip)
Select street, city, st, zip 
From tbl_Address (NOLOCK)
Where st = @st


-- unseeded RAND() will return the same number when called in rapid succession so
-- here, I seed it with a guaranteed different number each time. @@ROWCOUNT is the count from the most recent table operation.

Set @csr = Ceiling(RAND(convert(varbinary, newid())) * @@ROWCOUNT)

Select street, city, st, Right(('00000' + ltrim(zip)),5) As zip
From ##tmpAddress (NOLOCK)
Where id = @csr

XpiritO · Accepted Answer · 2019-10-21 14:03:04Z

If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:

SELECT * FROM Sales.SalesOrderDetail
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
/ CAST (0x7fffffff AS int)

The SalesOrderID column is included in the CHECKSUM expression so that NEWID() evaluates once per row to achieve sampling on a per-row basis. The expression CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float / CAST (0x7fffffff AS int) evaluates to a random float value between 0 and 1."

Source: http://technet.microsoft.com/en-us/library/ms189108(v=sql.105).aspx

This is further explained below:

How does this work? Let's split out the WHERE clause and explain it.

The CHECKSUM function is calculating a checksum over the items in the list. It is arguable over whether SalesOrderID is even required, since NEWID() is a function that returns a new random GUID, so multiplying a random figure by a constant should result in a random in any case. Indeed, excluding SalesOrderID seems to make no difference. If you are a keen statistician and can justify the inclusion of this, please use the comments section below and let me know why I'm wrong!

The CHECKSUM function returns a VARBINARY. Performing a bitwise AND operation with 0x7fffffff, which is the equivalent of (111111111...) in binary, yields a decimal value that is effectively a representation of a random string of 0s and 1s. Dividing by the co-efficient 0x7fffffff effectively normalizes this decimal figure to a figure between 0 and 1. Then to decide whether each row merits inclusion in the final result set, a threshold of 1/x is used (in this case, 0.01) where x is the percentage of the data to retrieve as a sample.

Source: https://www.mssqltips.com/sqlservertip/3157/different-ways-to-get-random-data-for-sql-server-data-sampling

Collectives™ on Stack Overflow

Random record from a database table (T-SQL)

6 Answers 6

Explanation

Notes

12 Comments

3 Comments

3 Comments

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Explanation

Notes

12 Comments

3 Comments

3 Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related