How can we optimise NOT IN sub query on same table?

Question

I have following query which is filtering results using sub queries with NOT IN how can I avoid NOT IN? Is there any alternative for NOT IN? How can I optimise overall query?

One option I see adding index on lower(display_name) but Is the any better or further optimisation I can do here?

SELECT  table_name.* FROM table_name 
WHERE (
        table_name.id IN (
        SELECT distinct on (name, mode, formulation, strength, generic_name) id FROM table_name 
        where lower(display_name) 
          NOT IN(SELECT lower(display_name) FROM table_name WHERE table_name.user_id IS NULL)
      ) 
ORDER BY table_name.created_at DESC LIMIT 100 OFFSET 0

Query Plan

------------------------------------------------------------------------------------------------
 Limit  (cost=9277.38..9277.38 rows=1 width=296)
   ->  Sort  (cost=9277.38..9351.72 rows=29736 width=296)
         Sort Key: table_name.created_at DESC
         ->  Nested Loop  (cost=8015.41..9128.70 rows=29736 width=296)
               ->  HashAggregate  (cost=8015.12..8017.12 rows=200 width=4)
                     Group Key: table_name_1.id
                     ->  Unique  (cost=7486.65..7932.69 rows=6594 width=93)
                           ->  Sort  (cost=7486.65..7560.99 rows=29736 width=93)
                                 Sort Key: table_name_1.name, table_name_1.mode, table_name_1.formulation, table_name_1.strength, table_name_1.generic_name
                                 ->  Seq Scan on table_name table_name_1  (cost=2286.19..5277.28 rows=29736 width=93)
                                       Filter: (NOT (hashed SubPlan 1))
                                       SubPlan 1
                                         ->  Bitmap Heap Scan on table_name table_name_2  (cost=106.16..2272.68 rows=5402 width=32)
                                               Recheck Cond: (user_id IS NULL)
                                               ->  Bitmap Index Scan on index_table_name_on_user_id  (cost=0.00..104.81 rows=5402 width=0)
                                                     Index Cond: (user_id IS NULL)
               ->  Index Scan using table_name_pkey on table_name  (cost=0.29..6.61 rows=1 width=296)
                     Index Cond: (id = table_name_1.id)

What I am trying to do is table_name is drugs which holds two list like

System wide drugs like added by admins in this case we keep user_id as null.
User specific drugs means added by user with user_id value

I am returning two results in search 1. admin drugs 2. all users drugs like

Admin list

admin_list = select * from table_name user_id IS NULL and LOWER(table_name.name) like %cal%

All display names from admin list which need to skip in following query

SELECT lower(display_name) FROM table_name WHERE table_name.user_id IS NULL

All users drugs but skipping / filtering out admin list based on display_name like

user_list = SELECT  table_name.* FROM table_name 
WHERE (
        table_name.id IN (
        SELECT distinct on (name, mode, formulation, strength, generic_name) id FROM table_name 
        where lower(display_name) 
          NOT IN(SELECT lower(display_name) FROM table_name WHERE table_name.user_id IS NULL)
      ) 
AND LOWER(table_name.name) LIKE '%cal%'
ORDER BY table_name.created_at DESC LIMIT 100 OFFSET 0

Goal to have two distinct lists admin_list and user_list where user_list should not contain any admin list display_name

I have table called dugs, which have list of drugs. drug can belong to System wide for all users with user_id null, same drug can be user specific with user_id = 1. I want search/return two things 1. search in system with user_id = null and search in rest of the list i.e. user_id not null but skipping system wide drugs with same display name — Pramod Shinde
– Pramod Shinde, Commented Oct 17, 2020 at 13:00
@ParmodShinde . . . Such an explanation belongs in the question, not in a comment. — Gordon Linoff
– Gordon Linoff, Commented Oct 17, 2020 at 13:03

score 2 · Accepted Answer · 2020-10-17 10:59:52Z

2

More often than not, NOT EXISTS if faster than NOT IN

SELECT t1.* 
FROM table_name t1
WHERE table_name.id IN (SELECT distinct on (t2.name, t2.mode, t2.formulation, t2.strength, t2.generic_name) t2.id 
                        FROM table_name t2
                        where not exists (select * 
                                          FROM table_name t3
                                          WHERE t3.user_id IS NULL
                                            and t2.lower(display_name) = t3.lower(display_name) )
                       ) 
ORDER BY table_name.created_at DESC 
LIMIT 100 OFFSET 0

An index on table_name ( (lower(display_name)) ) where user_is null should improve that even more.

Note that distinct on () without an ORDER BY is typically not a good idea.

edited Oct 17, 2020 at 10:59

answered Oct 17, 2020 at 10:52

user330315

Sign up to request clarification or add additional context in comments.

Comments

Gordon Linoff · Accepted Answer · 2020-10-17 11:50:59Z

0

You seem to want one id per name, mode, formulation, strength, generic_name combination where the display_name has no null user_id.

If that is correct, I'm sure there are simpler ways to write the query:

select t.*
from (select distinct on (name, mode, formulation, strength, generic_name) t.* 
      from (select t.*,
                   count(*) filter (where user_id is null) over (partition by lower(display_name)) as cnt
            from table_name
           ) t
      where cnt = 0
      order by name, mode, formulation, strength, generic_name, created_at desc
     ) t
order by created_at desc
limit 100 offset 0

answered Oct 17, 2020 at 11:50

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Collectives™ on Stack Overflow

How can we optimise NOT IN sub query on same table?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related