Hopefully this question is not too much open-ended… In a few words : I'm looking for a script or programming language with fast but easy access to a database (PostgreSQL).
I want to use results from queries on some tables in a PostgreSQL database as input for some R analysis. Queries are simple SELECT requests (there might be room for improvement in the requests, but for now I'm not looking this way — I already did a while ago), but within a loop on results of a first query. Tables include both numbers and strings, and are thousands if not hundreds of thousands rows long, so the total number of queries can be quite large.
Obviously I first wrote an R script using RPostgreSQL. However, it takes too much time to be comfortable to use (I'd like to be able to modify and rerun it anytime). I have already optimized this script quite efficiently, and ''system.time'' shows me that most of the time is spent on the DB query within the loop.
Then, as I figured out it would be way faster if I used a text file as input for R, I decided to translate this R script into python, using psycopg2. Unfortunately, the python script is not much faster than the R script.
Finally I started to write a C++ program using libpq-fe, but I stopped as I figured out it was not flexible enough (I mean, I would have to multiply the number of lines of my code by at least 3 or 4 in order to process the queries).
Thus I'm wondering which language (or maybe other R or python libraries ?) would provide the best compromise between speed and flexibility (in terms of queries results : lists, arrays, string manipulations…) for DB access (namely, PostgreSQL). That is, it needs to be much faster than R+RPostgreSQL and python+psycopg2, and almost as "flexible".
Thanks for suggestions (the language has to be linux-friendly).
Update : Here is a typical timing of the old versus new code using only the first 500 retrieved elements, after correcting the code for the N+1 issue as suggested by Ryan and :
> system.time(source("oldcode.R"));
user system elapsed
3.825 0.052 49.363
> system.time(source("newcode.R"));
user system elapsed
1.920 0.140 3.551
The same for the 1000 first retrieved elements :
> system.time(source("oldcode.R"));
user system elapsed
9.816 0.092 100.340
> system.time(source("newcode.R"));
user system elapsed
5.040 0.072 6.695
Probably worth a change indeed. ;-)