5

When I use R open source, if not using a specific package, it's not possible handle data sets bigger than RAM memory. So I would like to know if it's possible handle big data sets applying PL/R functions inside PostgreSQL.

I didn't found any documentation about this.

5
  • 2
    Also, consider the ff package, which allows you to store large data on disk. Commented May 17, 2013 at 15:17
  • 2
    There is some way to REALLY run R inside database? (non commercial like R on Oracle) Commented May 17, 2013 at 15:20
  • 2
    It is running REALLY inside PostgreSQL (R is symbolically linked to Postgres) but that does not remove the R RAM constraints. Commented May 17, 2013 at 15:28
  • What you mean for "symbolically linked"?? Beacause, if a funtion can be translated, in some way, to SQL would't be any constraints right? Commented May 17, 2013 at 16:44
  • BUT, if in the process, the data is passed to a R object, would have memory constraints, as the R engine will run the function. I know that in the Oracle implementation there isn't memory constraints, as the R interpreter act "really inside" the database. Commented May 17, 2013 at 16:49

2 Answers 2

11

As mentioned by Hong Ooi, PL/R loads an R interpreter into the PostgreSQL backend process. So your R code is running "in database".

There is no universal way to deal with memory limitations, but there are least two possible options:

  1. define a custom PostgreSQL aggregate, and use your PL/R function as the "final" function. In this way you are processing in groups, and thus less likely to have problems with memory. See the online PostgreSQL documentation and PL/R documentation for more detail (I don't post to stackoverflow often, so unfortunately it will not allow me to post the actual URLs for you)
  2. Use the pg.spi.cursor_open and pg.spi.cursor_fetch functions installed by PL/R into the R interpreter in order to page data into your R function in chunks.

See PL/R docs here: http://www.joeconway.com/plr/doc/index.html

I am guessing what you would really like to have is a data.frame in which the data is paged to and from an underlying database cursor transparently to your R code. This is on my long term TODO, but unfortunately I have not been able to find the time to work it out. I have been told that Oracle's R connector has this feature, so it seems it can be done. Patches welcomed ;-)

Sign up to request clarification or add additional context in comments.

1 Comment

Very thanks for the answer! I use a lot PostgreSQL and R, and when i knew about PL/R i became excited about the possibility of resolve R memory constraints and at the same time have the power of SQL.
1

No. PL/R just starts up a separate R process to run your R code. This uses exactly the same binaries and executables as what you'd use from the command line, so all the standard limitations still apply.

1 Comment

OK, but there is some way to run a real "in database analytics with" R?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.