0

I have a query in a function to select the top row and another for the last row, each query takes around 300ms to execute, and this query is executed a lot of times making the function useless

This is the query (this is a test, in the function parameters change):

SELECT the_geom
FROM "Entries" 
WHERE taxiid= 366 and timestamp between '2008-02-06 16:00:00' and timestamp '2008-02-06 16:00:00' + interval '5 minutes' 
ORDER BY entryid DESC 
LIMIT 1;;

and this is the EXPLAIN ANALYZE output of the query:

QUERY PLAN                                                           

---------------------------------------------------------------------     -------------------------------------------------------------------------------------------------------
Seq Scan on "Entries"  (cost=0.00..63538.80 rows=70 width=51) (actual time=184.409..342.049 rows=56 loops=1)
Filter: (("timestamp" >= '2008-02-06 16:00:00'::timestamp without time zone) AND ("timestamp" <= '2008-02-06 16:05:00'::timestamp without time zone
) AND (taxiid = 366))
Rows Removed by Filter: 2128847
Planning time: 0.191 ms
Execution time: 342.088 ms
(5 rows)

Is there a better way of getting top and last row?

EDIT: Thanks Drunix, that did help but, something that i cant understand is happening, with the index you suges i was able to go from ~300 ms to 0.2 ms

but if i change the time interval that is added to the timestamp to 120 minutes the index is not used and it keeps taking 300 ms

here is the proof(5 minute interval):

snowflake=# explain analyze Select the_geom from "Entries" 
where taxiid= 366 and "timestamp" between '2008-02-06 16:00:00' and "timestamp" '2008-02-06 16:00:00' + interval '5 minutes'
ORDER BY entryid ASC 
LIMIT 1;

QUERY PLAN                                                   

-------------------------------------------------------------------------
Limit  (cost=149.52..149.52 rows=1 width=55) (actual time=0.129..0.129 rows=1 loops=1)
->  Sort  (cost=149.52..149.70 rows=73 width=55) (actual time=0.127..0.127 rows=1 loops=1)
     Sort Key: entryid
     Sort Method: top-N heapsort  Memory: 25kB
     ->  Index Scan using entriesindex on "Entries"  (cost=0.43..149.15 rows=73 width=55) (actual time=0.045..0.090 rows=56 loops=1)
           Index Cond: ((taxiid = 366) AND ("timestamp" >= '2008-02-06 16:00:00'::timestamp without time zone) AND ("timestamp" <= '2008-02-06 16:
05:00'::timestamp without time zone))
Planning time: 0.266 ms
Execution time: 0.180 ms
(8 rows)

the other one(120 minutes interval):

snowflake=# explain analyze Select the_geom from "Entries" 
where taxiid= 366 and "timestamp" between '2008-02-06 16:00:00' and "timestamp" '2008-02-06 16:00:00' + interval '120 minutes' 
ORDER BY entryid ASC 
LIMIT 1;

QUERY PLAN                                                        

-------------------------------------------------------------------------
Limit  (cost=0.43..60.02 rows=1 width=55) (actual time=245.570..245.570 rows=1 loops=1)
->  Index Scan using "Entries_pkey" on "Entries"  (cost=0.43..97542.75 rows=1637 width=55) (actual time=245.568..245.568 rows=1 loops=1)
     Filter: (("timestamp" >= '2008-02-06 16:00:00'::timestamp without time zone) AND ("timestamp" <= '2008-02-06 18:00:00'::timestamp without tim
e zone) AND (taxiid = 366))
     Rows Removed by Filter: 853963
Planning time: 0.277 ms
Execution time: 245.616 ms
4
  • 3
    Do you have an index on (taxiid, timestamp)? Order is important. Commented Sep 4, 2015 at 16:26
  • 1
    @Drunix, not according to the execution plan - it scanned "Entries". It sure would help. Commented Sep 4, 2015 at 16:29
  • @Clay the execution plan shows that no Index is used, but there are more possible explanations than the obvious "there is none". Therefore the formulation as (guiding) question. Commented Sep 4, 2015 at 16:36
  • Please show definitions of all indexes on the table Commented Sep 4, 2015 at 16:42

1 Answer 1

1

Ok, rephrasing my comment as an answer:

Unless you already have it you should create a composite index:

create index somename on Entries(taxiid, timestamp);

According to your execution plan the combination of these fields should be rather selective, therefore an index scan should be more efficient. Note that an index on (timestamp, taxiid) is probably much less useful, because it will only be used to limit the row by timestamp. Put the columns that are checked for equality in front in similar cases.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.