6

I would like to bulk-INSERT/UPSERT a moderately large amount of rows to a postgreSQL database using R. In order to do so I am preparing a multi-row INSERT string using R.

 query <- sprintf("BEGIN;
                         CREATE TEMPORARY TABLE 
                         md_updates(ts_key varchar, meta_data hstore) ON COMMIT DROP;

                         INSERT INTO md_updates(ts_key, meta_data) VALUES %s;
                         LOCK TABLE %s.meta_data_unlocalized IN EXCLUSIVE MODE;

                         UPDATE %s.meta_data_unlocalized
                         SET meta_data = md_updates.meta_data
                         FROM md_updates
                         WHERE md_updates.ts_key = %s.meta_data_unlocalized.ts_key;
                         COMMIT;", md_values, schema, schema, schema, schema)

DBI::dbGetQuery(con,query)

The entire function can be found here. Surprisingly (at leat to me) I learned that the UPDATE part is not the problem. I left it out and ran a the query again and it wasn't much faster. INSERT a million+ records seems to be the issue here.

I did some research and found quite some information:

bulk inserts

bulk inserts II

what causes large inserts to slow down

answers from @Erwin Brandstetter and @Craig Ringer were particularly helpful. I was able to speed things up quite a bit by dropping indices and following a few other suggestions.

However, I struggled to implement another suggestion which sounded promising: COPY. The problem is I can't get it done from within R.

The following works for me:

sql <- sprintf('CREATE TABLE 
            md_updates(ts_key varchar, meta_data hstore);
            COPY md_updates FROM STDIN;')


 dbGetQuery(sandbox,"COPY md_updates FROM 'test.csv' DELIMITER ';' CSV;")

But I can't get it done without reading from a extra .csv file. So my questions are:

  • Is COPY really a promising approach here (over the multi-row INSERT I got?

  • Is there a way to use COPY from within R without writing data to a file. Data does fit in memory and since it's already in mem why write to disk?

I am using PostgreSQL 9.5 on OS X and 9.5 on RHEL respectively.

5
  • does it have to be in R? Can R call psql? If so, use \COPY (not COPY) Commented Apr 11, 2017 at 18:44
  • @Neil McGuigan yup, gotta be R. R can make any kind of system calls, so psql is an option but since this is part of a package and deployed to wherever I don't want to rely on psql. Still looking for an inside R solution. Commented Apr 11, 2017 at 18:56
  • please make Erwin back from Eric - it looks weird Commented Apr 11, 2017 at 20:12
  • stackoverflow.com/a/24289817/5315974 ?.. Commented Apr 11, 2017 at 20:16
  • Sorry, edited the name. Gotta blame auto correct. Plus me of course. Commented Apr 11, 2017 at 20:23

1 Answer 1

9

RPostgreSQL has a "CopyInDataframe" function that looks like it should do what you want:

install.packages("RPostgreSQL")
library(RPostgreSQL)
con <- dbConnect(PostgreSQL(), user="...", password="...", dbname="...", host="...")
dbSendQuery(con, "copy foo from stdin")
postgresqlCopyInDataframe(con, df)

Where table foo has the same columns as dataframe df

Sign up to request clarification or add additional context in comments.

3 Comments

thanks +1 this actually works. I been using the RPostgreSQL package for years but didn't search for such a function. It does exactly what I want. Will run a couple of benchmarks to see whether it really speeds things up.
thanks for your help. It's actually 6 times faster for larger INSERTs.
This link seems to be very helpful too to append more records on the postgres table - stackoverflow.com/questions/33634713/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.