1

I've come across some similar questions, but I still don't quite understand.

I have a MySQL database with a table in which I store data of people and unique codes. The base unit for me is the e-mail address. I want to select random records, but limit them so a specific email column can be selected only once. This is my table structure (I'm leaving some columns, that are not relevant to this question).

+-----+-------------------+---------+----------+----------+
| ID  | email             | name    | lastname | code     |
+-----+-------------------+---------+----------+----------+
|  1  | [email protected]    | Simon   | Hardy    | 123ABC   |
|  2  | [email protected]    | John    | Doe      | EEEEEE   |
|  3  | [email protected]    | John    | Doe      | AEAEAE   |
|  4  | [email protected]      | Bill    | Liebe    | 5D78AC   |
|  5  | [email protected]   | Ellen   | Petete   | 99AQE5   |
|  6  | [email protected]    | John    | Doe      | 000CVV   |
|  7  | [email protected]   | Peter   | Lorem    | 54ACSS   |
|  8  | [email protected]    | Emma    | Stone    | 98WW7Q   |
+-----+-------------------+---------+----------+----------+

If I limit my selection to 3 rows and somehow the row with email = [email protected] got selected, I need the other two rows with this email to be ignored/skipped. This is my query now:

SELECT * FROM people ORDER BY RAND() LIMIT 3

PS: I know "ORDER BY RAND()" is slow, I just didn't focuse on that part yet.

I was thinking about GROUP BY, but as far as I understand that way I would only get that one column, I need to fetch all of them.

Is there a straightforward solution for this in MySQL?

3
  • Why are email, name and lastname duplicated? Can't you use a users table and a code table containing a field with a foreign key to the user table? Commented Jan 14, 2019 at 13:02
  • That was my first thought. But users tend to enter their personal data differently (capitall letters, spaces, special characters etc.). Would it still be a better way? Commented Jan 14, 2019 at 13:07
  • DRY ! Otherwise, it's WET. Commented Jan 14, 2019 at 13:10

2 Answers 2

2

In MySQL 8+, you can use:

SELECT p.*
FROM people p
ORDER BY ROW_NUMBER() OVER (PARTITION BY email ORDER BY RAND())
LIMIT 3;

If you want to solve both the performance problem and the duplication problem at the same time . . . that is challenging. My recommendation is to select a smaller number of rows and just "hope" that there are enough different emails.

For instance, for 3 emails you might want to get about 100 rows with something like this:

select p.*,
       (@rn := if(@e = email, @rn + 1,
                  if(@e := email, 1, 1)
                 )
       ) as rn
from (select p.*
      from people p cross join
           (select count(*) as cnt from people) pp  -- can use primary key index
      where rand() < (100 / cnt) -- get about 100 rows
      order by email, rand()  -- only on about 100 rows
     ) p cross join
     (select @e := '', @rn := 0) params
having rn = 1
limit 3;
Sign up to request clarification or add additional context in comments.

5 Comments

Limiting the rows to a lower number is not really an option, because I need it to be 'fairly' random. I've tried the first query you posted above, but I'm getting this error: Unknown table 'mytable.p'null
@KristiánFilo . . . The first query is a small typo. The second does what you want -- it returns a random set of rows with unique emails. Note that the where clause is being used to get a random sample from the table, and then this is whittled down to three rows for the result set.
Ah, I understand now. Thank you. Question: If I limit the results to, say 50, but I only have 10 unique emails, is the correct behavior for the first query to show duplicates even though it should not?
@KristiánFilo . . . Yes. The first query will return 50 rows, in groups of unique emails. In your case, it would return 5 rows for each email, assuming that there are five.
Very straightforward and well documented reply, thank you very much.
0

EDIT**

SELECT * FROM Test WHERE id IN (SELECT MIN(id) FROM Test GROUP BY email) LIMIT 3;

this should do.

2 Comments

He wants to fetch all columns
You can't do that. You have fields non-aggregated used with a group by

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.