1

I have a postgresql operational DB with data partitioned per day and a postgresql data warehouse DB. In order to copy the data quickly from the operational DB to the DWH I would like to copy the tables as fast and with least of resources used. Since the tables are partitioned by day, I understand that each partition is a table as itself. Is that means I can somehow copy the data files between the machines and create the tables in the DWH with those data files? What is the best practice in that case?

EDIT: I will answer all questions asked in here: 1. I'm building an ETL. First step of ETL is to copy the data with less influence on the operational DB. 2. I would want to replicate the data if this won't slow the operational DB writings. 3. A bit more data, The operational DB is not in my responsepbility but the main concern is the write time on the that DB. It writes about 500 Million rows a day where there are hours that are more loaded but there aren't hours with no writings at all. 4. I came across with few tools/ways - Replication, pg_dump. But I couldn't find something that compare the tools to know when to use what and to understand what is fit to my case.

2
  • Any reason you wouldn't want to have the DWH database replicate the operational database in real-time? Perhaps tell more about the frequency and volume of reads from and writes to the operation database, and the requirements/intention of DWH. Commented Aug 29, 2012 at 15:48
  • Are you building an ETL process? Or is your "data warehouse" just a copy of the operational system (in which case it should be termed an "operational data store") Commented Aug 29, 2012 at 16:08

2 Answers 2

2

If you are doing a bulk transfer I would actually consider running pg_dump on the warehouse system and piping the results into psql once a day. You could probably run Slony too but that woudl require more resources, and would probably be more complicated.

Sign up to request clarification or add additional context in comments.

Comments

1

There are many good ways to replicate data between databases. While just looking for a

fast transfer of a table between databases

... a simple and fast solution is provided by the extension dblink. There are many examples here on SO. Try a search.

If you want a wider approach, continued synchronization etc. consider one of the established tools for replication. There is nice comparison in the manual to get you started.

1 Comment

I looked at this search. And I found some ways. I wanted to get extra information on what is best to my case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.