0

I currently have 1,100,000 rows in the table and it will increase over time. I am running this Postgres query on my database server. It's taking approx 5 sec to execute. How can I optimize it to make it execute faster?

Query:

    select sum(cast("total_value" as float)) as "total_value", sum(cast("fob_value" as float)) as "total_fob_value"
    from export
    where ("total_value" != 'N/A' and "total_value" != 'N?A') and
          ("fob_value" != 'N/A' and "fob_value" != 'N?A') and
          "product_desc" ilike '%pen%' and 
          ("shipping_date" between '2020-07-31T13:00:00.000Z' and '2020-08-28T09:58:04.451Z');
5
  • 3
    You might want to store your numbers as numbers, and not as strings, so you don't have to continually cast them and filter out non-numeric values. Commented Sep 3, 2020 at 11:18
  • 1
    You should definitely store numbers as numbers. This 'N/A' and 'N?A' stuff is such nonsense; that's what nullable columns are for. Don't defer fixing the data into a proper format to the point of querying; do it before insertion. Unless you have some really good reason not to...? Commented Sep 3, 2020 at 11:22
  • Thanks @underscore_d for editing the term lakhs, and converting it to a 'normal' number. Commented Sep 3, 2020 at 11:25
  • 1
    @underscore_d thanks dear, but I can't change the schema as of now. I do not have permission to do so and many records were entered already. So that's the concern. Commented Sep 3, 2020 at 11:43
  • Can you at least edit to show the full schema of the table including column types and indexes? Else people will try to recommend adding indexes that you might already have. Commented Sep 3, 2020 at 11:45

1 Answer 1

3

There is little that you can do for this query. Two possible indexes are a standard index on shipping_date or a GIN/GIST index for product_desc.

However, you can fix your data model. Do not store numeric values as strings. Invalid values can be stored using NULL. Also, do not use double quotes when defining column or table names. They just clutter queries.

With those changes, the query would simplify to:

select sum(total_value) as total_value, sum(fob_value) as total_fob_value
from export
where product_desc ilike '%pen%' and 
      shipping_date between '2020-07-31T13:00:00.000Z' and '2020-08-28T09:58:04.451Z';

This won't execute much faster, but it is much simpler to read and interpret.

Sign up to request clarification or add additional context in comments.

1 Comment

Right, that's how NULL should be used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.