2

I have a table with over then 50kk rows.

trackpoint:

+----+------------+-------------------+
| id | created_at | tag               |
+----+------------+-------------------+
|  1 | 1484407910 | visitorDevice643  |
|  2 | 1484407913 | visitorDevice643  |
|  3 | 1484407916 | visitorDevice643  |
|  4 | 1484393575 | anonymousDevice16 |
|  5 | 1484393578 | anonymousDevice16 |
+----+------------+-------------------+

where 'created_at' is a timestamp of row added. and i have a list of timestamps, for example like this one:

timestamps = [1502744400, 1502830800, 1502917200]

I need to select all timestamp in every interval between i and i+1 of timestamp.

Using Django ORM it's look like:

step = 86400
for ts in timestamps[:-1]:    
    trackpoint_set.filter(created_at__gte=ts,created_at__lt=ts + step).values('tag').distinct().count()

Because of actually timestamps list is very very longer and table has many of rows, finally i getting 500 time-out

So, my question is, how to for it in ONE raw SQL query join rows and list of values, so it looks like [(1502744400, 650), (1502830800, 1550)...]

Where second first value is timestamp, and the second is count of unique tags in each interval.

6
  • What's 650? What's 1550? See: Why should I provide an MCVE for what seems to me to be a very simple SQL query? Commented Jul 20, 2017 at 8:56
  • Thanks, I corrected my question Commented Jul 20, 2017 at 9:01
  • have you got an index on created_at? if it is a large query the index might give you a massive performance boost. Commented Jul 20, 2017 at 9:29
  • Yes, i do, and i tried UNIQUE pair tag-created_at, this didn't help too Commented Jul 20, 2017 at 9:32
  • 1
    Anyway, you have a minimum timestamp and maximum timestamp (1502744400 and 1502917200) in your example, so you can limit your query to that range. Commented Jul 20, 2017 at 9:53

1 Answer 1

1

First index created_at. Next build query like created_at in (timestamp, timestamp+1). For each timestamp, run the query one by one rather than all at once.

Sign up to request clarification or add additional context in comments.

3 Comments

As I said, because the table is very large and the array is much longer, I get 504 Gateway Time-out
Please try one batch at a time.
Sorry, this didn't help, If my step id 10min(600sec) in period of one day(86400sec) i actually get 144 iterations. And finally i get 504

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.