Select rows and remove duplicate based on value of a column

Question

I have the following query:

select "houses"."id", 
"houses"."uuid", 
COUNT(1) OVER() as full_count from "houses" 
CROSS JOIN LATERAL jsonb_array_elements(houses.types) house_types 
inner join "hoods" on "hoods"."id" = "houses"."hood_id" and "hoods"."owner_id" = 2 
inner join "groups" on "groups"."hood_id" = "hoods"."id" and "groups"."manager_id" = 54 
where house_types->>'type' = 'big' 
group by "houses"."id", "houses"."uuid" 
order by lower(houses.name) asc 
limit 20

Which properly gives me the first 20 houses which has a type 'big' that are in the hood which owner_id is 2 AND which hood has an associated group which manager is 54.

Now, the problem is that sometimes I will have houses which name will be the same, and I want to just keep one of those and removing the rest. So for example:

If my houses table looks like:

id, types, name
1, [{ type: 'rating' }], 'white house'
2, [{ type: 'rating' }], 'white house'
3, [{ type: 'rating' }], 'red house'

I would just get the rows with id 1 and 3.

What is a good way to do that in PostgreSQL assuming that I can have both offsets and limits applied to the query, and I want to remove the duplicates.

Gordon Linoff · Accepted Answer · 2019-03-13 12:23:35Z

1

Instead of group by, use distinct on:

select distinct on (lower(h.name)) h.id, h.uuid
      COUNT(*) OVER() as full_count
from houses h cross join lateral
     jsonb_array_elements(h.types) ht inner join
     "hoods" ho
     on ho.id = h.hood_id and
        ho.owner_id = 2 inner join
     "groups" g
     on g.hood_id = ho.id and
        g.manager_id = 54 
where house_types->>'type' = 'big' 
order by lower(houses.name) asc  
limit 20;

Edit:

select h.*, count(*) over ()  as full_count
from (select distinct on (lower(h.name)) h.id, h.uuid
      from houses h cross join lateral
           jsonb_array_elements(h.types) ht inner join
           "hoods" ho
           on ho.id = h.hood_id and
              ho.owner_id = 2 inner join
           "groups" g
           on g.hood_id = ho.id and
              g.manager_id = 54 
      where house_types->>'type' = 'big' 
      order by lower(houses.name) asc  
     ) h
limit 20

edited Mar 13, 2019 at 12:23

answered Mar 13, 2019 at 10:57

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Hommer Smith Over a year ago

Thanks Gordon. Why do you need both lower(h.name) and h.id and houses.uuid in the DISTINCT ON, if we just want them to be distinct by name?

Gordon Linoff Over a year ago

@HommerSmith . . . I think you are right. That is not needed.

Hommer Smith Over a year ago

Gordon, there is a problem with this approach. Even though the distinct ON works, the COUNT(*) OVER() is still counting the removed rows. How can I have both distinct ON and count over all the results using the window function so I can properly know all the potential unique rows that exist besides the limit?

Gordon Linoff Over a year ago

@HommerSmith . . . I would recommend a subquery with distinct on to reduce the number of rows. Then use the count() window function.

Hommer Smith Over a year ago

What do you mean a subquery with DISTINCT ON? I was thinking on actually having a subquery that would do the COUNT. Do you mind expanding your answer?

|

Collectives™ on Stack Overflow

Select rows and remove duplicate based on value of a column

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related