4

Let me know if this should be posted on DBA.stackexchange.com instead...

I have the following query:

SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
  ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
  ON jobs.service_of_process_id = service_of_processes.id
WHERE
  (jobs.account_id = 250093
  OR court_cases.account_id = 250093)
ORDER BY
  court_cases.court_date DESC NULLS LAST,
  court_cases.id DESC
LIMIT 30
OFFSET 0;

But it takes a good 2-4 seconds to run, and in a web application this is unacceptable for a single query.

I ran EXPLAIN (ANALYZE, BUFFERS) on the query as suggested on the PostgreSQL wiki, and have put the results here: http://explain.depesz.com/s/Yn6

The table definitions for those tables involved in the query is here (including the indexes on foreign key relationships):

http://sqlfiddle.com/#!15/114c6

Is it having issues using the indexes because the WHERE clause is querying from two different tables? What kind of index or change to the query can I make to make this run faster?

These are the current sizes of the tables in question:

PSQL=# select count(*) from service_of_processes;
 count  
--------
 103787
(1 row)

PSQL=# select count(*) from jobs;
 count  
--------
 108995
(1 row)

PSQL=# select count(*) from court_cases;
 count 
-------
 84410
(1 row)

EDIT: I'm on Postgresql 9.3.1, if that matters.

2 Answers 2

3

or clauses can make optimizing a query difficult. One idea is to split the two parts of the query into two separate subqueries. This actually simplifies one of them a lot (the one on court_cases.account_id).

Try this version:

(SELECT cc.*
 FROM "court_cases" cc
 WHERE cc.account_id = 250093
 ORDER BY cc.court_date DESC NULLS LAST,
          cc.id DESC
 LIMIT 30
) UNION ALL
(SELECT cc.*
 FROM "court_cases" cc LEFT OUTER JOIN
      service_of_processes sop
      ON sop.court_case_id = cc.id LEFT OUTER JOIN
      jobs j
      ON j.service_of_process_id = sop.id
 WHERE (j.account_id = 250093 AND cc.account_id <> 250093)
 ORDER BY cc.court_date DESC NULLS LAST, id DESC
 LIMIT 30
)
ORDER BY court_date DESC NULLS LAST,
         id DESC
LIMIT 30 OFFSET 0;

And add the following indexes:

create index court_cases_accountid_courtdate_id on court_cases(account_id, court_date, id);
create index jobs_accountid_sop on jobs(account_id, service_of_process_id);

Note that the second query uses and cc.count_id <> 250093, which prevents duplicate records. This eliminates the need for distinct or for union (without union all).

Sign up to request clarification or add additional context in comments.

1 Comment

Syntax error on that last create index, you're missing the name of the index and on. Should be something like create index jobs_account_id_sop_id on jobs(account_id, service_of_process_id);
0

I'll try modifying the query as the following:

SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
  ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
  ON jobs.service_of_process_id = service_of_processes.id and jobs.account_id = 250093
WHERE
  (court_cases.account_id = 250093)
ORDER BY
  court_cases.court_date DESC NULLS LAST,
  court_cases.id DESC
LIMIT 30
OFFSET 0;

I think that the issue is in the fact that the where filter is not properly decomposed by query planner optimizer, a really strange performance bug

2 Comments

This doesn't work :\ It doesn't include those court cases where jobs.account_id = 250093 but court_cases.account_id != 250093, which is why there was an OR in the original query :(
You're right... In fact, the problem is that the where condition can be evaluated only after the computatikn of the full set of joins, which can request a long time because tables are big. It's not an index problem... I'll try decomposing the query in two subqueries, splitting the where filter, and then applying a select distinct from the union of the two queries

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.