0

Task: copy all data in a database (without schema) to another database (possibly of a different type). I can't modify source database, so it is a read-only backup

Context: integrate Oracle with a number of DBs. Now I'm integrating Oracle and Postgres.

Resources: connection string only, with ability to connect database with highest available privileges. (I can't access it via ssh - no way to create ordinal backup and download files via ssh, or to compile and start web/ftp server, etc.)

Question: Is there any proven and FAST way to pull this data? Maybe someone has an open source solution with clean code?

The word "fast" is present here because just selecting N rows in a turn (using rownum or row_number()) and transfering to a target database or intermediate file is too slow.

10
  • 3
    I'd want to look at ETL tools. Pantaho, Talend, etc. Commented Oct 24, 2012 at 9:05
  • 1
    Since it is not listed under "resources", are we to assume that utilities provided as part of the Oracle client install (say, the export utility) are disallowed? Is the end state goal to transfer all the data from an Oracle database to a non-Oracle database? Or to another Oracle database? On speed, why are you selecting batches of rows rather than simply issuing a SELECT for all the rows and letting the client fetch the data in batches? Commented Oct 24, 2012 at 9:15
  • +Justin Cave, yes, end state goal is to transfer all data from an Oracle to non-Oracle, given that we can't contact Oracle server admin and ask to install/tune something and expecting istallation with default parameters. Also there's a lot of data (terabytes) with really fat tables. Imagine three servers: source database server, intermediate server, target database server. I think, it will be very hard to install terabytes of RAM on an intermediate server, therefore transferring will require strong micromanagement of data chunks. Commented Oct 24, 2012 at 9:32
  • 1
    I changed the question slightly as you don't want to take a backup (which could be done using dbms_datapump from within SQL) but you actually want to copy the data to a different DBMS - which is something completely different than "taking a backup" Commented Oct 24, 2012 at 9:51
  • 1
    I think your only choice is to dump everything to text files and then use e.g. COPY in Postgres to import the flat files. Commented Oct 24, 2012 at 9:52

2 Answers 2

1

A light-weight ETL tool like spring-batch might be the perfect tool for this task.

Micromanagement of data chunks is what it is written around.

Take a look at the jdbc cursor example: you just configure the SELECT and INSERT statement, and the mapping, and Spring-Batch will take care of the pagination.

You can find it in github: https://github.com/SpringSource/spring-batch/blob/master/spring-batch-samples/src/main/resources/jobs/iosample/jdbcCursor.xml

Reference can be found at: http://static.springsource.org/spring-batch/reference/html/readersAndWriters.html#database

Spring batch keeps state of how many records already have been processed, and allows to continue a previously crashed process. It does this by saving counters in a 'jobRepository', which can be in a third database, for example.

Of course, this is a pure java solution, a native solution may be faster, but if you only get JDBC connection strings, you might give this a shot. This also assumes you know the table structure of all tables you want to transfer. If not, simple JDBC tools like e.g. SquirrelSQL can help you.

Greets, Geert.

Sign up to request clarification or add additional context in comments.

Comments

1

I suggest you take a look at Liquibase. I have used it successfully to keep both schemas and data aligned across several environments (albeit only SQLServers, but I'm certain it works for disparate RDBMS' as well).

As for the performance I am a bit worried, as you mention "terabytes of data"... Still, it might be worth a try.

Cheers,

4 Comments

Liquibase is about schema (table definition, ...) management. Not about copying data from one DBMS to another.
Liquibase is great if you start a database project and you want to keep a history of changes or execute migrations. You can't apply it to a legacy database that has no liquibase history yet. You could use it to do a database diff after the migration.
@greyfairer: of course you can apply it to legacy databases. Use "generateChangeLog" and then "changelogSync".
@a_horse_with_no_name you're right, of course. There's even an option diffType='data'. Wouldn't do it for millions of records, though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.