0

If I am wrong in my understanding, please feel free to correct me

In my project, we have a timeseries database. It is setup as 3-node (One leader, 2 read-replicas) patroni cluster. Each node is an AWS EC2 instance where time-series data stored in hypertables supported by TimescaleDB extension on a Postgres database. We are using opensource timescaledb here.

As the data is growing each day, the EBS data volume on a node (EC2 instance) is expected to hit its size limit in future. Hence there need of sharding.

As the potential solution, we looked at distributed hypertables in timescaledb. But it seems to be a dead end as they have deprecated multi-node support (on top of which distributed hypertables are provided) in opensource timescaledb.

There is another option i.e. to use Citus (Postgres extension to implement sharding). But, Citus doesn't support TimescaleDB extension in Postgres. So, as a high level solution in this case, we have to convert timescaleDB hypertables to regular Postgres tables first for us to be able to use Citus. So far Citus seems to be most suitable (relatively) choice to implement sharding.

Could someone please suggest a better way (if there is any)?

Edit Note: Data archival or purge is not an options for us. All of the data is needed. Compression has been applied already as much it is possible. This has bought us some additional time before the storage limit for an EBS data volume is reached, but eventually sharding will be required.

3
  • If you never want to hit the limit then combination of compression and data retention need to be done to your data. Sharding will just divide your table into partitions which I believe will not reduce it's size by large extent? Also another approach is to offload data to S3 using a job and purge it after. Commented Aug 11, 2024 at 4:46
  • Data archival or purge is not an options for us. All the data is needed. Compression has been applied already as much it is possible. This has bought us some additional time before the size limit is reached, but eventually sharding will be required. Commented Aug 11, 2024 at 7:25
  • By sharding, I meant that to distribute the underlying data storage of a table to multiple data nodes (EC2 instances in our case) Commented Aug 11, 2024 at 7:33

1 Answer 1

0

Disclaimer: I am a former employee at Citusdata

I'd like to share some guidance on using Citus for timeseries datasets. If you are ok to convert your hypertables to vanilla tables, here are some of your options:

  • Create partitions and shards
  • Scale out ( i.e. distribute shards across worker nodes )
  • Automate partition creation with pg_cron
  • Archive frozen shards/partitions using columnar storage

More details in the documentation at https://docs.citusdata.com/en/stable/use_cases/timeseries.html

2
  • You are welcome to suggest your own product as an solution to a user's question. However, you should put in (something like) Caveat: I work for CitusData - otherwise your answer could be construed as spam/unsolicited advertising! I would respectfully urge you to modify your answer accordingly. Commented Aug 15, 2024 at 19:04
  • @Vérace thanks for your comment. I added a disclaimer at the top of my message now. Commented Mar 14 at 15:25

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.