Select random values from each group, SQL

Question

I have a project through which I'm creating a game powered by a database.

The database has data entered like this:

(ID, Name) || (1, PhotoID),(1,PhotoID),(1,PhotoID),(2,PhotoID),(2,PhotoID) and so on. There are thousands of entries.

This is my current SQL statement:

$sql = "SELECT TOP 8 * FROM Image WHERE Hidden = '0' ORDER BY NEWID()";

But this can also produce results with matching IDs, where I need to have each result have a unique ID (that is I need one result from each group).

How can I change my query to grab one result from each group?

Thanks!

Test schema: sqlfiddle.com/#!3/657ad

mellamokb
– mellamokb

2012-07-31 22:40:10 +00:00
Commented Jul 31, 2012 at 22:40 — mellamokb
– mellamokb, Commented Jul 31, 2012 at 22:40

Nikola Markovinović · Accepted Answer · 2012-07-31 22:48:38Z

4

Since ORDER BY NEWID() will result in tablescan anyway, you might use row_number() to isolate first in group:

; with randomizer as (
  select id,
         name,
         row_number() over (partition by id
                            order by newid()) rn
    from Image
   where hidden = 0
)
select top 8
       id,
       name
  from randomizer
 where rn = 1
-- Added by mellamokb's suggestion to allow groups to be randomized
order by newid()

Sql Fiddle playground thanks to mellamokb.

edited Jul 31, 2012 at 22:48

answered Jul 31, 2012 at 22:42

Nikola Markovinović

19.4k5 gold badges48 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mellamokb Over a year ago

Hmm. Doesn't seem to randomize the overall groups. Wonder if throwing another order by newid() on the final query would fix it? EDIT: Ya that seems to do it: sqlfiddle.com/#!3/657ad/14

Nikola Markovinović Over a year ago

@mellamokb I was rather under impression that number 8 in top corresponds to number of groups because it is a game and I would expect a slot for each image group. I know that this is figment of my imagination though ;-)

mellamokb · Accepted Answer · 2012-07-31 22:39:43Z

2

Looks like this may work, but I can't vouch for performance:

SELECT TOP 8 ID,
  (select top 1 name from image i2
   where i2.id = i1.id order by newid())
FROM Image i1
WHERE hidden = '0'
group by ID
ORDER BY NEWID();

Demo: http://www.sqlfiddle.com/#!3/657ad/6

answered Jul 31, 2012 at 22:39

mellamokb

56.8k12 gold badges111 silver badges138 bronze badges

Comments

Aaronaught · Accepted Answer · 2012-07-31 23:13:39Z

2

If you have an index on the ID column and want to take advantage of the index and avoid a full table scan, do your randomization on the key values first:

WITH IDs AS
(
  SELECT DISTINCT ID
  FROM Image
  WHERE Hidden = '0'
),
SequencedIDs AS
(
  SELECT ID, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Seq
  FROM IDs
),
ImageGroups AS
(
  SELECT i.*, ROW_NUMBER() OVER (PARTITION BY i.ID ORDER BY NEWID()) Seq
  FROM SequencedIDs s
  INNER JOIN Image i
    ON i.ID = s.ID
  WHERE s.Seq < 8
  AND i.Hidden = '0'
)
SELECT *
FROM ImageGroups
WHERE Seq = 1

This should drastically reduce the cost over the table scan approach, although I don't have a schema big enough that I can test with - so try running some statistics in SSMS and make sure ID is actually indexed for this to be effective.

answered Jul 31, 2012 at 23:13

Aaronaught

123k26 gold badges273 silver badges344 bronze badges

1 Comment

Aaronaught Over a year ago

Note - in the sqlfiddle sandbox this is significantly cheaper than mellamokb's answer and only slightly higher than Nikola's - however, the sample size is extremely small, and I believe this would perform better on a very large number of rows because it does not need to scan them all - only 1 row per initial group and all rows for the much smaller top N random groups.

Miro Hudak · Accepted Answer · 2012-07-31 22:39:33Z

1

select * from (select * from photos order by rand()) as _SUB group by _SUB.id;

edited Jul 31, 2012 at 22:39

answered Jul 31, 2012 at 22:32

Miro Hudak

2,2032 gold badges22 silver badges31 bronze badges

2 Comments

mellamokb Over a year ago

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions: sqlfiddle.com/#!3/657ad/9

Miro Hudak Over a year ago

@mellamokb everything is allowed in the world of mysql, yet i just noticed, sql-server is the tag... my apologies.

iruvar · Accepted Answer · 2012-07-31 22:39:38Z

0

 select ID, Name from (select ID, Name, row_number() over 
 (partition by ID, Name order by ID) as ranker from Image where Hidden = 0 ) Z where ranker = 1
 order by newID()

answered Jul 31, 2012 at 22:39

iruvar

23.5k7 gold badges58 silver badges83 bronze badges

Collectives™ on Stack Overflow

Select random values from each group, SQL

5 Answers 5

2 Comments

Comments

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related