Didn't see any boost in a simple postgres and timescaleDB query performance test?

Question

I tried a simple performance test between postgres and timescaleDB. Here are my results:-

Total rows 403,204

With Postgres

Fetch Time For Aggregation Query 176 rows : 203ms - 240ms

Fetch Time For Join Query 102 rows : 660ms - 720ms

With TimescaleDB

Fetch Time For Aggregation Query 176 rows : 175ms - 200ms

Fetch Time For Join Query 102 rows : 614ms - 650 ms

CREATE TABLE public.sensors(
  id SERIAL PRIMARY KEY,
  type VARCHAR(50),
  location VARCHAR(50)
);

-- Postgres table
CREATE TABLE sensor_data (
  time TIMESTAMPTZ NOT NULL,
  sensor_id INTEGER,
  temperature DOUBLE PRECISION,
  cpu DOUBLE PRECISION,
  FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);

--drop table public.sensor_data;

-- TimescaleDB table
CREATE TABLE sensor_data_ts (
  time TIMESTAMPTZ NOT NULL,
  sensor_id INTEGER,
  temperature DOUBLE PRECISION,
  cpu DOUBLE PRECISION,
  FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);
SELECT create_hypertable('sensor_data_ts', 'time');

-- Insert Data

INSERT INTO sensors (type, location) VALUES
('a','floor'),
('a', 'ceiling'),
('b','floor'),
('b', 'ceiling');


-- Postgres 

INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
SELECT
  time,
  sensor_id,
  random() AS cpu,
  random()*100 AS temperature
FROM generate_series(now() - interval '50 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);

-- TimescaleDB
INSERT INTO sensor_data_ts (time, sensor_id, cpu, temperature)
SELECT
  time,
  sensor_id,
  random() AS cpu,
  random()*100 AS temperature
FROM generate_series(now() - interval '50 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);


--truncate table public.sensor_data;
--truncate table public.sensor_data_ts;

select count(*) from public.sensor_data sd ;
select count(*) from public.sensor_data_ts sd ;

--Postgres

--Aggregate queries
SELECT 
  floor(extract(epoch from "time")/(60*60*24*2)) as period,
  AVG(temperature) AS avg_temp, 
  AVG(cpu) AS avg_cpu 
FROM sensor_data 
GROUP BY period;
--ORDER BY PERIOD;

--Join Queries
SELECT 
  sensors.location,
  floor(extract(epoch from "time")/(60*60*24*7)) as period,
  AVG(temperature) AS avg_temp, 
  last(temperature, time) AS last_temp, 
  AVG(cpu) AS avg_cpu 
FROM sensor_data JOIN sensors on sensor_data.sensor_id = sensors.id
GROUP BY period, sensors.location;

--Timescale DB

--Aggregate Queries
SELECT 
  time_bucket('2 day', time) AS period, 
  AVG(temperature) AS avg_temp, 
  AVG(cpu) AS avg_cpu 
FROM sensor_data_ts 
GROUP BY period;
--ORDER BY PERIOD;

--Join Queries
SELECT 
  sensors.location,
  time_bucket('1 week', time) AS period, 
  AVG(temperature) AS avg_temp, 
  last(temperature, time) AS last_temp, 
  AVG(cpu) AS avg_cpu 
FROM sensor_data JOIN sensors on sensor_data.sensor_id = sensors.id
GROUP BY period, sensors.location;

I was expecting some tangible boost in query performance. What else can I do improve query performance ?

Mike Freedman · Accepted Answer · 2020-08-13 17:00:42Z

2

A few things:

I'm a little confused. time_bucket is a TimescaleDB function, not a Postgres function, so it is probably running some of our code.
You are still performing a full table scan of all your data. There's not much in the way of optimizations to do here. And the dataset is small (400K) so will fit all in buffer cache; if you want to see some insert/query performances, likely need (a) much more data, (b) more complex types of queries.
But TimescaleDB also has other features. For example, turn on compression and you'll likely find these "full table scans" to be quicker (albeit once you get into disk-bound workloads). Or turn on continuous aggs so you can continuously/incrementally materialize these results to serve, e.g., user-facing dashboards.

answered Aug 13, 2020 at 17:00

Mike Freedman

1,96212 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ankitj Over a year ago

Thank you for pointing out incorrect time bucket usage. I believe it was running some of the code. I have updated the queries (no timescaleDB functions) and do see some boost. I will look into point 3 for further optimization.

Collectives™ on Stack Overflow

Didn't see any boost in a simple postgres and timescaleDB query performance test?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related