Postgres - Fastest way to fetch all rows in multiple tables

Question

SELECT table_1.time, table_1.time, table_2.time FROM table_1 
INNER JOIN table_2 on table_1.time = table_2.time 
INNER JOIN table_3 on table_1.time = table_3.time
...;

I am using the above query syntax to query all rows in multiple tables, join columns from different tables, and return. However, as the number of rows in a table increases & number of tables increases, the performance drops in a large scale. Is there any way to optimize the query performance? There will be about 0.1 - 1 million rows for each table.

I've heard terms like indexing, partitioning, and SSD, but I'm really of a novice in Postgres, and not sure which one to look in to. Can anyone provide some query command syntax that's better than what I currently have, or give some detailed advice on editing the structure of my database?

Edit: Fetching all data happens only once when loading a page. So I'm trying to load all the data that's present in DB to visualize plots. After the initial plot is generated, the page will be querying only the last rows of each table to update the plots. The table structures are very simple.

Table 1: SPM1

        time         | spm1  |
------------------------------
 2018-09-05 22:23:52 | 43.21 |

Table 2: SPM2
        time         | spm2  |
------------------------------
 2018-09-05 22:23:52 | 43.21 |

... and there are about 30 tables of these

Thanks,

"I am using the above query syntax to query all rows in multiple tables" If you're querying all rows then yes, performance will drop linearly as the tables grow. You can stave this off with more hardware and other optimizations, but ultimately performance will continue to drop as the data grows. If you have to query all rows often, there's probably something wrong with your design. There isn't enough detail here to say what. I'd suggest asking another question with specifics of your schema and the problem you're solving. — Schwern
– Schwern, Commented Sep 6, 2018 at 0:11
@Schwern I put an update, though I'm not very sure if I've provided enough info here — Eric Kim
– Eric Kim, Commented Sep 6, 2018 at 0:19
No indexing strategy will help in retrieving all rows of the tables. If at all parallel query will help (introduced in 9.6, improved in 10 and further improved in the upcoming 11) — user330315
– user330315, Commented Sep 6, 2018 at 5:41
@EricKim, Can you elaborate on your use case. Why do return all the values of time columns? The logic of the join implies they are 100% equal. (JOIN table_2 on table_1.time = table_2.time, etc). Regarding performance, creating indexes on filtered columns will help - if you restrict the dataset, as @Schwern suggested. — filiprem
– filiprem, Commented Sep 6, 2018 at 12:19

Schwern · Accepted Answer · 2018-09-06 01:01:14Z

2

Here's a few idea to improve things based on what you've mentioned.

Could all the tables be turned into one with three columns?

create table spm (
    id serial primary key,
    time datetime not null,
    spm numeric(5,2) not null,
    number smallint not null
);

insert into spm (time, spm, number)
    values ('2018-09-05 22:23:52', 43.21, 1),
           ('2018-09-05 22:23:52', 43.21, 2)

That would greatly simplify the queries. Joins are efficient, but 20 joins is a bit much.

select time, spm, number from spm;

Fetching all data happens only once when loading a page. So I'm trying to load all the data that's present in DB to visualize plots. After the initial plot is generated, the page will be querying only the last rows of each table to update the plots.

If these plots be cached and periodically regenerated or updated, that would be a huge performance improvement not just for the database query, but for all that time generating the plots.

answered Sep 6, 2018 at 1:01

Schwern

167k28 gold badges225 silver badges370 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Eric Kim Over a year ago

unfortunately the tables has to be separate for a reason... but thanks

Schwern Over a year ago

@EricKim What's the reason?

pewpewlasers · Accepted Answer · 2018-09-06 07:16:14Z

1

Have your tried indexing the time fields? You can create indexes like this:

CREATE INDEX "table_1_time" ON "table_1" ("time");
CREATE INDEX "table_2_time" ON "table_2" ("time");
CREATE INDEX "table_3_time" ON "table_3" ("time");
... and so on ...

Run the above query in the database once. Once the indexes are created, adding new or removing old data is indexed automatically. Indexing makes queries much faster.

After indexing, try your query again. But be reminded that you are pulling the whole of table_1 anyway. The indexing might help with join.

edited Sep 6, 2018 at 7:16

answered Sep 6, 2018 at 5:27

pewpewlasers

3,2454 gold badges34 silver badges63 bronze badges

Collectives™ on Stack Overflow

Postgres - Fastest way to fetch all rows in multiple tables

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related