2

I have some problems with my query - it takes too much time (2636124 ms!):

 SELECT COUNT(*) AS "__count" 
 FROM "dictionary_dictionary" 
 WHERE NOT ("dictionary_dictionary"."id" IN (SELECT U1."word_id" AS Col1 
                                             FROM "dictionary_frequencydata" U1 
                                             WHERE U1."user_id" = 1));

This query is generated by ORM (Django). When I try to execute it (with ORM) my app hangs and also when I put in to psql - psql hangs.

EXPLAIN ANALYZE:

Aggregate  (cost=329583550.40..329583550.41 rows=1 width=8) (actual 
time=2636109.932..2636109.933 rows=1 loops=1)
   ->  Seq Scan on dictionary_dictionary  (cost=0.00..329583390.76 
       rows=63856 width=0) (actual time=2636109.922..2636109.922 rows=0 loops=1)
           Filter: (NOT (SubPlan 1))
           Rows Removed by Filter: 127712
           SubPlan 1
             ->  Materialize  (cost=0.00..4821.74 rows=135828 width=4) (actual time=0.006..12.453 rows=63856 loops=127712)
                ->  Seq Scan on dictionary_frequencydata u1  (cost=0.00..3611.60 rows=135828 width=4) (actual time=0.299..95.915 rows=127712 loops=1)
                     Filter: (user_id = 1)
                     Rows Removed by Filter: 28054
 Planning time: 0.277 ms
 Execution time: 2636124.744 ms
 (11 wierszy)`

My models from Django

class Dictionary(DateTimeModel):
    base_word = models.ForeignKey(BaseDictionary, related_name=_('dict_words'))
    word = models.CharField(max_length=64)
    version = models.ForeignKey(Version)

class FrequencyData(DateTimeModel):
    word = models.ForeignKey(Dictionary, related_name=_('frequency_data'))
    count = models.BigIntegerField(null=True, blank=True)
    source = models.ForeignKey(Source, related_name=_('frequency_data'), null=True, blank=True)
    user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name=_('frequency_data'))
    user_ip_address = models.GenericIPAddressField(null=True, blank=True)
    date_of_checking = models.DateTimeField(null=True, blank=True)
    is_checked = models.BooleanField(default=False)

The table definitions:

\d+ dictionary_dictionary
                                                                 Tabela "public.dictionary_dictionary"
       Kolumna        |           Typ            | Porównanie | Nullowalne |                             Domyślnie                              | Przechowywanie | Cel statystyk | Opis 
----------------------+--------------------------+------------+------------+--------------------------------------------------------------------+----------------+---------------+------
 id                   | integer                  |            | not null   | nextval('dictionary_dictionary_id_seq'::regclass) | plain          |               | 
 date_created         | timestamp with time zone |            | not null   |                                                                    | plain          |               | 
 date_modified        | timestamp with time zone |            | not null   |                                                                    | plain          |               | 
 word                 | character varying(64)    |            | not null   |                                                                    | extended       |               | 
 algorithm_version_id | integer                  |            | not null   |                                                                    | plain          |               | 
 base_word_id         | integer                  |            | not null   |                                                                    | plain          |               | 

Indeksy:
    "dictionary_dictionary_pkey" PRIMARY KEY, btree (id)
    "dictionary_phonet_algorithm_version_id_0f0af100" btree (algorithm_version_id)
    "dictionary_dictionary_base_word_id_8db15cb4" btree (base_word_id)

Ograniczenia kluczy obcych:
    "dictionary__algorithm_version_id_0f0af100_fk_phonetic_" FOREIGN KEY (algorithm_version_id) REFERENCES dictionary_algorithmversion(id) DEFERRABLE INITIALLY DEFERRED
    "dictionary__base_word_id_8db15cb4_fk_phonetic_" FOREIGN KEY (base_word_id) REFERENCES dictionary_grammaticaldictionary(id) DEFERRABLE INITIALLY DEFERRED

Wskazywany przez:
    TABLE "dictionary_frequencydata" CONSTRAINT "dictionary__word_id_c231110d_fk_phonetic_" FOREIGN KEY (word_id) REFERENCES dictionary_dictionary(id) DEFERRABLE INITIALLY DEFERRED

=========
\d+ dictionary_frequencydata
                                                               Tabela "public.dictionary_frequencydata"
     Kolumna      |           Typ            | Porównanie | Nullowalne |                           Domyślnie                           | Przechowywanie | Cel statystyk | Opis 
------------------+--------------------------+------------+------------+---------------------------------------------------------------+----------------+---------------+------
 id               | integer                  |            | not null   | nextval('dictionary_frequencydata_id_seq'::regclass) | plain          |               | 
 date_created     | timestamp with time zone |            | not null   |                                                               | plain          |               | 
 date_modified    | timestamp with time zone |            | not null   |                                                               | plain          |               | 
 count            | bigint                   |            |            |                                                               | plain          |               | 
 user_ip_address  | inet                     |            |            |                                                               | main           |               | 
 date_of_checking | timestamp with time zone |            |            |                                                               | plain          |               | 
 is_checked       | boolean                  |            | not null   |                                                               | plain          |               | 
 source_id        | integer                  |            |            |                                                               | plain          |               | 
 user_id          | integer                  |            | not null   |                                                               | plain          |               | 
 word_id          | integer                  |            | not null   |                                                               | plain          |               | 

Indeksy:
    "dictionary_frequencydata_pkey" PRIMARY KEY, btree (id)
    "dictionary_frequencydata_source_id_38bb205a" btree (source_id)
    "dictionary_frequencydata_user_id_c6dfedce" btree (user_id)
    "dictionary_frequencydata_word_id_c231110d" btree (word_id)

Ograniczenia kluczy obcych:
    "dictionary__source_id_38bb205a_fk_phonetic_" FOREIGN KEY (source_id) REFERENCES dictionary_frequencysource(id) DEFERRABLE INITIALLY DEFERRED
    "dictionary__user_id_c6dfedce_fk_auth_user" FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
    "dictionary__word_id_c231110d_fk_phonetic_" FOREIGN KEY (word_id) REFERENCES dictionary_dictionary(id) DEFERRABLE INITIALLY DEFERRED

It's shared hosting. Dictionary db tabel - 120k rows FrequencyData - 160k rows

6
  • 2
    how long does it take for the following two queries to execute: select count(*) from dictionary_dictionary; and select count(DISTINCT d.id) from dictionary_dictionary d join f dictionary_frequencydata on d.id = f.word_id WHERE f.user_id = 1 Commented May 30, 2018 at 17:33
  • 1
    first: 54 ms, second: 345 ms Explain: pastebin.com/T96Q3ipt Commented May 31, 2018 at 11:30
  • how long does run SELECT U1."word_id" AS Col1 FROM "dictionary_frequencydata" U1 WHERE U1."user_id" = 1? Commented Jun 4, 2018 at 21:15
  • have you tried manipuating it with DISTINCT keyword? e.g. SELECT COUNT(*) AS "__count" FROM "dictionary_dictionary" WHERE NOT ("dictionary_dictionary"."id" IN (SELECT distinct U1."word_id" AS Col1 FROM "dictionary_frequencydata" U1 WHERE U1."user_id" = 1)); Commented Jun 4, 2018 at 21:16
  • DISTINCT works. Thanks! If you write this answer I'll mark it as accepted. Commented Jun 5, 2018 at 13:11

2 Answers 2

1

Try adding DISTINCT keyword, which should narrow the checked subset of ids:

SELECT COUNT(*) AS "__count" 
FROM "dictionary_dictionary" 
WHERE NOT ("dictionary_dictionary"."id" IN (SELECT distinct U1."word_id" AS Col1
                                            FROM "dictionary_frequencydata" U1 
                                            WHERE U1."user_id" = 1));
Sign up to request clarification or add additional context in comments.

2 Comments

distinct is ok: 263 ms. Here you are EXPLAIN ANALYZE: pastebin.com/uZcckEYp
1

In this case, your query should be a lot faster if you re write it like below, as both the subqueries are fast. The final result is equivalent to the query generated by django.

It seems the seq scan with filter operation on dictionary_dictionary is quite expensive, but the plain seq scan is very fast. I'm not sure why this is so.

SELECT 
tot - excl
from (select count(*) tot
      from dictionary_dictionary) t1
, (select count(DISTINCT d.id) excl
   from dictionary_dictionary d 
   join dictionary_frequencydata f
     on d.id = f.word_id 
   where f.user_id = 1 ) t2

If rows are infrequently inserted into dictionary_dictionary, then the count should not change that often. then it will be more efficient to cache the result of select count(*) from dictionary_dictionary and subtract the count of excluded ids from it. When rows are inserted / removed from dictionary_dictionary, the cache would need to be updated. It is possible to maintain this cache automatically using triggers on insert & delete from dictoinary_dictionary

2 Comments

Thanks, it's faster (288 ms). Count was simply example, but the main problem is not with COUNT but with SELECT some records - I'll try to rewrite with your sugestion (orginal post: stackoverflow.com/questions/50604128/django-orm-exclude-fails). But why filter operation is so expensive?
please update the other question with appropriate table descriptions \d+ tablename and output from explain analyze of the query you are trying to run.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.