mySQL return random rows with possible duplicates

Question

I'm trying to randomise a certain number of rows but lets say there are only 4 rows in the database and i need to get 6 random rows I want the possibility (even if there are more than 6 rows in the table) to produce duplicate rows.

Is this easily achieved in mySQL ?

My current query is like this:

SELECT * FROM winners ORDER BY RAND() LIMIT 6

The idea is a user can win more than once. :)

Hope you can help! !

Bill Karwin · Accepted Answer · 2013-11-04 19:25:53Z

3

Any solution involving ORDER BY RAND() is frowned upon, because it can't use an index and it basically sorts the whole table (which may grow very large) just to pick one row.

The better solutions involve generating a random number between MIN(id) and MAX(id) and that's your chosen random row. As your table gets larger, this becomes a bigger and bigger advantage.

It's so much more efficient to pick a random ID, that I'd recommend just picking six random ID's one at a time, and then looking up those rows one at a time. Therefore you have a chance of picking a given row more than once.

If you aren't guaranteed that all your ID's are consecutive, you can pick the first ID that is greater than the random pick. So in pseudocode:

$MIN, $MAX = SELECT MIN(ID), MAX(ID) FROM winners
FOR LOOP FROM 1 to 6
    $R = $MIN+RANDOM($MAX-$MIN)
    $WINNER[] = SELECT * FROM winners WHERE id >= $R LIMIT 1

answered Nov 4, 2013 at 19:25

Bill Karwin

567k87 gold badges710 silver badges870 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

mpen Over a year ago

If we have IDs [1,2,3,4,5,1000] then 1000 has a much greater chance of being picked.

MickLH Over a year ago

If perfect stats are absolutely necessary then internal rowid of the sql server could be used.

Bill Karwin Over a year ago

@MichLH, MySQL doesn't have an internal rowid, at least not one you can query. One could use another table and fill it with consecutive integers, mapped to ID values in the original table.

Sir Over a year ago

So theres no way to for example say "pick the 5th row" rather than the row with id 5 ?

Bill Karwin Over a year ago

You could use LIMIT 1 OFFSET 5 to choose by position, but you'll find that choosing a row by value (aided by an index) is far better for performance.

Barmar · Accepted Answer · 2013-11-04 19:13:13Z

2

SELECT * FROM winners ORDER BY RAND() LIMIT 1
UNION ALL
SELECT * FROM winners ORDER BY RAND() LIMIT 1
UNION ALL
SELECT * FROM winners ORDER BY RAND() LIMIT 1
UNION ALL
SELECT * FROM winners ORDER BY RAND() LIMIT 1
UNION ALL
SELECT * FROM winners ORDER BY RAND() LIMIT 1
UNION ALL
SELECT * FROM winners ORDER BY RAND() LIMIT 1

edited Nov 4, 2013 at 19:13

answered Nov 4, 2013 at 17:51

Barmar

789k57 gold badges555 silver badges669 bronze badges

4 Comments

Gordon Linoff Over a year ago

. . the downvotes appear to be random maliciousness.

Sir Over a year ago

I blocked one user who got rather irritated with me in chat i suspect he retaliated that way but can't be sure.

mpen Over a year ago

@Dave: You can be sure. Go to his profile, and then go to his Votes tab.

MickLH Over a year ago

I down voted because the question started in chat with more specific requirements that this answer does not fit. I changed my vote to an up-vote when I noticed the question here lacks some details mentioned in chat.

Gordon Linoff · Accepted Answer · 2013-11-04 17:56:43Z

1

Assuming you have at least one row, you can multiply the number of rows and then return randomly from that enlarged set:

SELECT w.*
FROM winners w cross join
     (select 1 as n union all select 2 union all select 3 union all select 4 union all
      select 5 union all select 6
     ) nums
ORDER BY RAND()
LIMIT 6;

answered Nov 4, 2013 at 17:56

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

4 Comments

Sir Over a year ago

Can you explain what this is doing ? Is it temporarily doubling all the rows in the table and selecting from that instead?

Gordon Linoff Over a year ago

@Dave . . . The cross join is multiplying the number of rows, by six in this case (because there are six rows in the nums table). It then randomly selects from those enlarged results.

Barmar Over a year ago

Assuming ORDER BY RAND() is O(n log n), this solution is O(6n log 6n), while my solution is O(6(n log n)), right?

Gordon Linoff Over a year ago

@BarMar . . . It is actually hard to compare in terms of performance. This version reads the original table once, whereas yours reads it once per subquery (which could be significant if winners were a complex query). This is sorting six times the original data, whereas yours is sorting the original data six times (these have the same complexity because constant factors are ignored in complexity calculations).

Community · Accepted Answer · 2017-03-20 10:29:31Z

0

This questions sounds like it could be the XY Problem. It sounds like you might be asking about a solution to your problem rather than your problem. See: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem

I think it might be better to turn four rows into six in your application rather than selecting duplicate rows.

edited Mar 20, 2017 at 10:29

CommunityBot

11 silver badge

answered Nov 4, 2013 at 17:54

Anthony

1,8772 gold badges18 silver badges30 bronze badges

13 Comments

Sir Over a year ago

If only 4 people enter but there has to be 6 wins. Then how can i turn four rows into six? Its data duplication in the database which i am trying to avoid.

Anthony Over a year ago

I would think it would be better to solve this problem in your application rather than selecting duplicate rows from the database. No? Create duplicates in your application, instead modifying the results in the select query.

Sir Over a year ago

create duplicates in the database is the complete opposite of an efficient database structure ;)

mpen Over a year ago

@ajb32x: Your suggested solution of duplicating app-side if there are less than 6 results won't work. OP said he would like to allow to possibility of dupes even when there are 6 or more records in the DB. I think OP is correct that this is a DB-level problem.

Sir Over a year ago

@ajb32x if it only uses the selected 6 for the 6 draws the rest of the rows in the database if there was others would not have equal chance of being selected in the 5 other draws right ?

|

Collectives™ on Stack Overflow

mySQL return random rows with possible duplicates

4 Answers 4

5 Comments

4 Comments

4 Comments

13 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

4 Comments

4 Comments

13 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related