2

I am investigating a performance issue in our postgres DB.

The basic thing I was able to improve is to use && comparison instead of IN to achieve a 5-10 times faster execution of a query.

The query before:

EXPLAIN (ANALYSE,VERBOSE)
SELECT locations.calculated_display_name AS Label, COUNT(DISTINCT users.id), locations.location_number AS Id
FROM "users"
         JOIN locations ON locations.location_number = ANY (users.locations)
WHERE locations.is_active = true
  AND locations.location_number IN
      ('99343995', '98098165', '97546499', '96937258', '95343057', '93813133', '93372359', '92468731', '92324318',
       '91511037', '90911561', '90683979', '90468206', '89912329', '87309765', '85642590', '85057826', '84736125',
       '84684098', '84114459', '81878601', '81873327', '81522236', '80844049', '80687059', '80141218', '79878991',
       '79610433', '79569429', '78011806', '77826704', '77596016', '76315282', '74049751', '72413190', '71525098',
       '69925333', '69893404', '69853249', '68655074', '68283943', '66287287', '65245703', '64951988', '64430136',
       '64398484', '64125662', '62076844', '61913963', '61795172', '61628230', '61561052', '60224118', '60179397',
       '59225905', '59036037', '56854896', '56508362', '56467520', '55176983', '54329936', '54124339', '53791738',
       '52760776', '50400189', '49850065', '48982981', '47365754', '46369743', '45458367', '45249322', '43660028',
       '41229651', '40987234', '40265995', '40219519', '38330986', '38225717', '37963675', '37431133', '35325656',
       '35195588', '35072509', '33704492', '32368037', '31426908', '31162313', '29434544', '26921088', '25498178',
       '22518127', '21574917', '21573577', '21036970', '20182103', '19465577', '19401629', '13501017', '12725460',
       '10854560')
  AND role IN
      ('role1', 'role2', 'role3', 'role4', 'role5')
GROUP BY locations.location_number, locations.city, locations.street

The improved query:

SELECT locations.calculated_display_name AS Label, COUNT(DISTINCT users.id), locations.location_number AS Id
FROM "users"
         JOIN locations ON locations.location_number = ANY (users.locations)
WHERE locations.is_active = true
  AND string_to_array(locations.location_number, ',')::character varying[] &&
      '{99343995,98098165,97546499,96937258,95343057,93813133,93372359,92468731,92324318,91511037,90911561,90683979,90468206,89912329,87309765,85642590,85057826,84736125,84684098,84114459,81878601,81873327,81522236,80844049,80687059,80141218,79878991,79610433,79569429,78011806,77826704,77596016,76315282,74049751,72413190,71525098,69925333,69893404,69853249,68655074,68283943,66287287,65245703,64951988,64430136,64398484,64125662,62076844,61913963,61795172,61628230,61561052,60224118,60179397,59225905,59036037,56854896,56508362,56467520,55176983,54329936,54124339,53791738,52760776,50400189,49850065,48982981,47365754,46369743,45458367,45249322,43660028,41229651,40987234,40265995,40219519,38330986,38225717,37963675,37431133,35325656,35195588,35072509,33704492,32368037,31426908,31162313,29434544,26921088,25498178,22518127,21574917,21573577,21036970,20182103,19465577,19401629,13501017,12725460,10854560}'::character varying[]
  AND role IN
      ('role1', 'role2', 'role3', 'role4', 'role5')
GROUP BY locations.location_number, locations.city, locations.street;

The basic difference is how the column value is checked against a list of values:

Slow: column IN (...) (translated to (locations.location_number)::text = ANY ('{...}'::text[])) in the query plan)

5-10x faster: string_to_array(column, ',')::character varying[] && '{...}'::character varying[] (translated to ((string_to_array((locations.location_number)::text, ','::text))::character varying[] && '{...}'::character varying[])) in query plan)

Query plan of the first query:

GroupAggregate  (cost=61136.64..61206.47 rows=537 width=89) (actual time=1258.966..1259.389 rows=534 loops=1)
  Output: locations.calculated_display_name, count(DISTINCT users.id), locations.location_number, locations.city, locations.street"
  Group Key: locations.location_number
  ->  Sort  (cost=61136.64..61158.13 rows=8595 width=97) (actual time=1258.955..1259.049 rows=3621 loops=1)
        Output: locations.location_number, locations.calculated_display_name, users.id, locations.city, locations.street"
        Sort Key: locations.location_number, users.id"
        Sort Method: quicksort  Memory: 521kB
        ->  Nested Loop  (cost=0.29..60574.99 rows=8595 width=97) (actual time=0.173..1256.031 rows=3621 loops=1)
              Output: locations.location_number, locations.calculated_display_name, users.id, locations.city, locations.street"
              ->  Seq Scan on public.users  (cost=0.00..761.98 rows=15193 width=65) (actual time=0.010..4.569 rows=15193 loops=1)
                    Output: users.first_name, users.last_name, users.role, users.locations, users.enabled, users.countries, users.intent_to_view_personal_data, users.id, users.email, users.object_id, users.last_enabled_at, users.client_ids, users.can_manage_api_keys"
                    Filter: (users.role = ANY ('{...}'::text[]))"
                    Rows Removed by Filter: 1667
              ->  Index Scan using idx_loc_country on public.locations  (cost=0.29..3.93 rows=1 width=81) (actual time=0.082..0.082 rows=0 loops=15193)
                    Output: locations.location_number, locations.is_active, locations.client_id, locations.location_id, locations.customer_site_name, locations.calculated_display_name, locations.name, locations.country, locations.city, locations.street, locations.updated_at"
                    Index Cond: (((locations.location_number)::text = ANY ((users.locations)::text[])) AND ((locations.location_number)::text = ANY ('{...}'::text[])))"
                    Filter: locations.is_active
Planning Time: 1.933 ms
Execution Time: 1259.439 ms

The improved query plan:

GroupAggregate  (cost=3408.33..3442.67 rows=1962 width=594) (actual time=246.693..247.108 rows=534 loops=1)
  Output: locations.calculated_display_name, count(DISTINCT users.id), locations.location_number, locations.city, locations.street"
  Group Key: locations.location_number
  ->  Sort  (cost=3408.33..3413.24 rows=1962 width=602) (actual time=246.686..246.776 rows=3621 loops=1)
        Output: locations.location_number, locations.calculated_display_name, users.id, locations.city, locations.street"
        Sort Key: locations.location_number, users.id"
        Sort Method: quicksort  Memory: 521kB
        ->  Nested Loop  (cost=25.60..3301.03 rows=1962 width=602) (actual time=1.920..243.774 rows=3621 loops=1)
              Output: locations.location_number, locations.calculated_display_name, users.id, locations.city, locations.street"
              ->  Bitmap Heap Scan on public.users  (cost=25.32..517.40 rows=421 width=48) (actual time=1.876..4.429 rows=15168 loops=1)
                    Output: users.first_name, users.last_name, users.role, users.locations, users.enabled, users.countries, users.intent_to_view_personal_data, users.id, users.email, users.object_id, users.last_enabled_at, users.client_ids, users.can_manage_api_keys"
                    Recheck Cond: (users.role = ANY ('{...}'::text[]))"
                    Heap Blocks: exact=469
                    ->  Bitmap Index Scan on idx_lower_searchfields  (cost=0.00..25.21 rows=421 width=0) (actual time=1.829..1.830 rows=15168 loops=1)
                          Index Cond: (users.role = ANY ('{...}'::text[]))"
              ->  Index Scan using idx_loc_country on public.locations  (cost=0.29..6.56 rows=5 width=586) (actual time=0.015..0.016 rows=0 loops=15168)
                    Output: locations.location_number, locations.is_active, locations.client_id, locations.location_id, locations.customer_site_name, locations.calculated_display_name, locations.name, locations.country, locations.city, locations.street, locations.updated_at"
                    Index Cond: ((locations.location_number)::text = ANY ((users.locations)::text[]))
                    Filter: (locations.is_active AND ((string_to_array((locations.location_number)::text, ','::text))::character varying[] && '{...}'::character varying[]))"
                    Rows Removed by Filter: 2
Planning Time: 1.183 ms
Execution Time: 247.248 ms

You notice here, that postgres with the improved query uses a Bitmap Heap Scan instead of Seq scan.

So my question is: Why does postgres not understand in the first query, that there is an index that can be used?

My guess currently is the explicit typing of character varying[] that somehow allows it to use some kind of index but I would like to have an explaination for it, because I feel I am missing some important point here that would help making queries way more performant.

I hope somesome can explain the reason for it :)

4
  • How are the indexes defined? Can you make that EXPLAIN (ANALYZE, BUFFERS)? Commented Jan 17 at 10:00
  • Hi, and welcome to dba.se! Could you please give us (in the question - edit function, using formatted text) the table creation DDL along with indexes - thanks. Also, as suggested, EXPLAIN (ANALYZE, BUFFERS, VERBOSE) <your query>. Also, please include your version of PostgreSQL with any question. Commented Jan 17 at 11:54
  • You group by columns that don't show up in the result, makes no sense. location_number is matched to an element of location_number in one place and treated as comma-separated string in the other. It can't be both, and probably shouldn't be either in a proper relational design. Start by providing information for your performance question as instructed here. Commented Jan 18 at 3:58
  • The difference in the actual times occurs in the node which is not different between the plans, so you are asking the wrong question. Your 2nd query is probably faster only because the data it needs is already in cache, while the first time it was not. Commented Jan 18 at 18:20

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.