Sharding in Timescaledb (Postgres) Opensource

Question

If I am wrong in my understanding, please feel free to correct me

In my project, we have a timeseries database. It is setup as 3-node (One leader, 2 read-replicas) patroni cluster. Each node is an AWS EC2 instance where time-series data stored in hypertables supported by TimescaleDB extension on a Postgres database. We are using opensource timescaledb here.

As the data is growing each day, the EBS data volume on a node (EC2 instance) is expected to hit its size limit in future. Hence there need of sharding.

As the potential solution, we looked at distributed hypertables in timescaledb. But it seems to be a dead end as they have deprecated multi-node support (on top of which distributed hypertables are provided) in opensource timescaledb.

There is another option i.e. to use Citus (Postgres extension to implement sharding). But, Citus doesn't support TimescaleDB extension in Postgres. So, as a high level solution in this case, we have to convert timescaleDB hypertables to regular Postgres tables first for us to be able to use Citus. So far Citus seems to be most suitable (relatively) choice to implement sharding.

Could someone please suggest a better way (if there is any)?

Edit Note: Data archival or purge is not an options for us. All of the data is needed. Compression has been applied already as much it is possible. This has bought us some additional time before the storage limit for an EBS data volume is reached, but eventually sharding will be required.

If you never want to hit the limit then combination of compression and data retention need to be done to your data. Sharding will just divide your table into partitions which I believe will not reduce it's size by large extent? Also another approach is to offload data to S3 using a job and purge it after. — Prabhakar Reddy
– Prabhakar Reddy, Commented Aug 11, 2024 at 4:46
Data archival or purge is not an options for us. All the data is needed. Compression has been applied already as much it is possible. This has bought us some additional time before the size limit is reached, but eventually sharding will be required. — HelloJack
– HelloJack, Commented Aug 11, 2024 at 7:25
By sharding, I meant that to distribute the underlying data storage of a table to multiple data nodes (EC2 instances in our case) — HelloJack
– HelloJack, Commented Aug 11, 2024 at 7:33

Abdullah Hanefi Önaldı · Accepted Answer · 2025-03-14 15:22:29Z

0

Disclaimer: I am a former employee at Citusdata

I'd like to share some guidance on using Citus for timeseries datasets. If you are ok to convert your hypertables to vanilla tables, here are some of your options:

Create partitions and shards
Scale out ( i.e. distribute shards across worker nodes )
Automate partition creation with pg_cron
Archive frozen shards/partitions using columnar storage

More details in the documentation at https://docs.citusdata.com/en/stable/use_cases/timeseries.html

edited Mar 14 at 15:22

answered Aug 13, 2024 at 11:20

Abdullah Hanefi Önaldı

1012 bronze badges

You are welcome to suggest your own product as an solution to a user's question. However, you should put in (something like) Caveat: I work for CitusData - otherwise your answer could be construed as spam/unsolicited advertising! I would respectfully urge you to modify your answer accordingly.

Vérace
– Vérace

2024-08-15 19:04:36 +00:00
Commented Aug 15, 2024 at 19:04
@Vérace thanks for your comment. I added a disclaimer at the top of my message now.

Abdullah Hanefi Önaldı
– Abdullah Hanefi Önaldı

2025-03-14 15:25:18 +00:00
Commented Mar 14 at 15:25

Add a comment |

Stack Exchange Network

Sharding in Timescaledb (Postgres) Opensource

1 Answer 1

Your Answer

Hot Network Questions

Sharding in Timescaledb (Postgres) Opensource

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions