0

I was wondering how to get random rows within a SQL query since the full query has over 10 Billion rows and would explode our servers.

How can I query a subset which is sampled in this query structure?

SELECT 
a,b,c
FROM test 
WHERE    
test.a= 123
AND test.b ILIKE '10008383825311900000' 
LIMIT 1000000
1

2 Answers 2

3

The canonical answer is to sort and use limit:

select t.*
from t
order by rand()
limit 100;

But do not do this! Instead, use rand() in a where clause. For a 1% sample:

select t.*
from t
where rand() < 0.01;

Random sampling methods in MySQL tend to require scanning the entire table, which is going to be expensive in your case.

EDIT:

To optimize your query, I would start by using = rather than ILIKE:

SELECT a, b, c
FROM test 
WHERE test.a = 123 AND
     test.b = '10008383825311900000' 
LIMIT 1000000;

You want an index on test(a, b, c).

Sign up to request clarification or add additional context in comments.

1 Comment

I added my query and hope you can have a look at it and tell me how to restructure. Thank you
0

Here's another answer.

select * from (
    select 
        a,b,c
        ,row_number() over (order by a) as rn
    from test 
    where     
        t1.a= 123
        AND t1.b ILIKE '10008383825311900000' 
        ) t1
     inner join 
         (select floor(rand()*100) as rn from test limit 1000000) t2 on t2.rn = t1.rn

1 Comment

I just updated and added my query. Could you take a look please and tell me how to incorporate your code?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.