0
SELECT table_1.time, table_1.time, table_2.time FROM table_1 
INNER JOIN table_2 on table_1.time = table_2.time 
INNER JOIN table_3 on table_1.time = table_3.time
...;

I am using the above query syntax to query all rows in multiple tables, join columns from different tables, and return. However, as the number of rows in a table increases & number of tables increases, the performance drops in a large scale. Is there any way to optimize the query performance? There will be about 0.1 - 1 million rows for each table.

I've heard terms like indexing, partitioning, and SSD, but I'm really of a novice in Postgres, and not sure which one to look in to. Can anyone provide some query command syntax that's better than what I currently have, or give some detailed advice on editing the structure of my database?

Edit: Fetching all data happens only once when loading a page. So I'm trying to load all the data that's present in DB to visualize plots. After the initial plot is generated, the page will be querying only the last rows of each table to update the plots. The table structures are very simple.

Table 1: SPM1

        time         | spm1  |
------------------------------
 2018-09-05 22:23:52 | 43.21 |

Table 2: SPM2
        time         | spm2  |
------------------------------
 2018-09-05 22:23:52 | 43.21 |

... and there are about 30 tables of these

Thanks,

4
  • 1
    "I am using the above query syntax to query all rows in multiple tables" If you're querying all rows then yes, performance will drop linearly as the tables grow. You can stave this off with more hardware and other optimizations, but ultimately performance will continue to drop as the data grows. If you have to query all rows often, there's probably something wrong with your design. There isn't enough detail here to say what. I'd suggest asking another question with specifics of your schema and the problem you're solving. Commented Sep 6, 2018 at 0:11
  • @Schwern I put an update, though I'm not very sure if I've provided enough info here Commented Sep 6, 2018 at 0:19
  • No indexing strategy will help in retrieving all rows of the tables. If at all parallel query will help (introduced in 9.6, improved in 10 and further improved in the upcoming 11) Commented Sep 6, 2018 at 5:41
  • @EricKim, Can you elaborate on your use case. Why do return all the values of time columns? The logic of the join implies they are 100% equal. (JOIN table_2 on table_1.time = table_2.time, etc). Regarding performance, creating indexes on filtered columns will help - if you restrict the dataset, as @Schwern suggested. Commented Sep 6, 2018 at 12:19

2 Answers 2

2

Here's a few idea to improve things based on what you've mentioned.

Could all the tables be turned into one with three columns?

create table spm (
    id serial primary key,
    time datetime not null,
    spm numeric(5,2) not null,
    number smallint not null
);

insert into spm (time, spm, number)
    values ('2018-09-05 22:23:52', 43.21, 1),
           ('2018-09-05 22:23:52', 43.21, 2)

That would greatly simplify the queries. Joins are efficient, but 20 joins is a bit much.

select time, spm, number from spm;

Fetching all data happens only once when loading a page. So I'm trying to load all the data that's present in DB to visualize plots. After the initial plot is generated, the page will be querying only the last rows of each table to update the plots.

If these plots be cached and periodically regenerated or updated, that would be a huge performance improvement not just for the database query, but for all that time generating the plots.

Sign up to request clarification or add additional context in comments.

2 Comments

unfortunately the tables has to be separate for a reason... but thanks
@EricKim What's the reason?
1

Have your tried indexing the time fields? You can create indexes like this:

CREATE INDEX "table_1_time" ON "table_1" ("time");
CREATE INDEX "table_2_time" ON "table_2" ("time");
CREATE INDEX "table_3_time" ON "table_3" ("time");
... and so on ...

Run the above query in the database once. Once the indexes are created, adding new or removing old data is indexed automatically. Indexing makes queries much faster.

After indexing, try your query again. But be reminded that you are pulling the whole of table_1 anyway. The indexing might help with join.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.