1

I am trying to fetch some results from the database using two SQL select queries, I am using CTE, but since I have millions of rows I need to limit the results, I want to limit results to 20, what I tried is:

with fetch_new_events as 
(
select * from events e where id > @las_max_id_processed 
order by e.id
    limit 20), 
fetch_old_events as (
 select * from events e where id < @las_min_id_processed 
    order by e.id desc
    limit 20-count(fetch_new_events.id))
select * from fetch_new_events
union select * from fetch_old_events

Aggregate functions don't seem to work on limit, moreover, even if they do, I would have a problem if 20-count() returns a negative number, the Database engine will throw an exception so I have to use max(0,20-count()) or something like that. Could somebody help with this?

You may argue I could use a single select query, I think I can't because one query fetched the new events that were added since I started processing them, so to avoid overcomplicating the client(consumer), the consumer only keeps track of two integers the max and min id processed, that means the results I get every time should have contiguous Ids so there is no gap and missing rows, for that matter I need to make the first query to order by id ascending order, and for the second query since I am interested in the earliest events, I need to order it by id descending order.

Update

This is the sample data:

create table events
(
    id serial not null
        constraint events_pk
            primary key,
    name varchar
);
insert into public.events (id, name) values (7, 'event7');
insert into public.events (id, name) values (6, 'event6');
insert into public.events (id, name) values (5, 'event5');
insert into public.events (id, name) values (4, 'event4');
insert into public.events (id, name) values (3, 'event3');
insert into public.events (id, name) values (2, 'event2');
insert into public.events (id, name) values (1, 'event1');


I assume the following starting point, I started when there was only five rows, the first time the client fetched the rows with:

select * from events e
    order by e.id desc
    limit 3

he got events with id 5,3. While processing a producer pushed more events, now the client wants to get 3 events that have an id either > 5 or < 4, so he does:

with fetch_new_events as
(
select * from events e where id > 5
order by e.id
    limit 3),
fetch_old_events as (
 select * from events e where id < 3
    order by e.id desc
    limit 3-count(fetch_new_events.id))
select * from fetch_new_events
union select * from fetch_old_events;

which fails.

Problem aside

For the curious, to understand why the gap may happen if we use a single query, add few more entries:

insert into public.events (id, name) values (8, 'event8');
insert into public.events (id, name) values (9, 'event9');
insert into public.events (id, name) values (10, 'event10');

Then query with one single select statement:

select * from events e
where e.id > 5 or e.id < 3
order by e.id desc
limit 3

We will get now, events with id 10,9,8.

what happens? well, the client would have processed now the following ranges:

  • Events with id from 5 to 3.
  • Events with id from 10 to 8

so if we have a client very simple(as I want it to be), since it keeps track of only two integers, the max id and the min id processed, it would think he processed all ids from 10 to 3, which is not the case! we would have missed events with id 7 and 6. To overcome this either I have to make a complex client remembering ranges rather than two simple integers, or use what I said.

Update2

Somebody known as jer_s in the postgresql slack channel gave me the solution, I just have to add another CTE query that would only contain the row count, then use that value for the limit:

with new_events as (
    select * from events e
    where e.id > 5
    order by e.id
    limit 3
    ), new_row_count as (
        select count('*') as row_count from new_events
    ), old_events as (
        select * from events e
        where e.id < 3
        order by e.id desc
        limit 3-(select row_count from new_row_count)
    )
select * from new_events
union
select *
from old_events;

This would return events with id 7,6,2 so the events being processed would be contiguous, from 7 to 2!

1
  • Sample data would help your question. Commented Apr 18, 2021 at 9:42

1 Answer 1

2

I really don't follow what you are trying to do and why simpler version do not work.

However, I can see the superficial problem you are having with this code:

limit 20-count(fetch_new_events.id)

The simple solution is to use window functions:

with fetch_new_events as (
      select *
      from events e
      where id > @las_max_id_processed 
      order by e.id
      limit 20
     ), 
     fetch_old_events as (
      select e.*
      from (select e.*, row_number() over (order by e.id desc) as seqnum
            from events e
            where id < @las_min_id_processed 
           ) e
      where seqnum <= 20 - (select count(*) from fetch_new_events)
     )
select . . .   -- list the columns you want here
from fetch_new_events
union
select . . .   -- list the columns you want here
from fetch_old_events;

Note that you need to also list all the columns explicitly to handle the seqnum column.

Sign up to request clarification or add additional context in comments.

3 Comments

Your solution works too, thanks!. What I try to do is process events that are pushed to the database, but not in any order, I want always the more recent events, that means, if there are new events pushed since I started processing them, I want them first, then I want the old events in order by id desc order. The issue is, if for the new events that were pushed after I started processing, I get them with order by id desc, there is a gap that the client would miss, I explained that in the question, does it make sense now?
@Melardev . . . This addresses the issue with your code described in the question.
Yes, that's right I checked it and your solution works, thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.