Need Framework to handle Interactions between Redshift and python

Question

I am building a python application with a lot of interactions between Amazon Redshift and local python (sending queries to redshift, sending results to local etc...). My question is: what is the cleanest way to handle such interactions.

Currently, I am using sqlalchemy to load tables directly on local thanks to pandas.read_sql(). But I am not sure this is very optimised or safe.

Would it be better to go through Amazon S3, and then bring back files with boto, to finally read them with pandas.read_csv()?

Finally, is there a better idea to handle such interactions, maybe not doing everything in Python?

Travis Oliphant · Accepted Answer · 2015-10-18 19:38:08Z

3

You can look at the blaze ecosystem for ideas and libraries you might find useful: http://blaze.pydata.org

The blaze library itself lets you write queries at a high, pandas-like level, and then it translates the query to redshift (using SQLAlchemy): http://blaze.readthedocs.org/en/latest/index.html

But this may be too high-level for your purposes and you might need more precise control over the behavior -- but it would let you keep the code similar regardless of how and when you moved the data around.

The odo library can be used independently to copy from Redshift to S3 to local files and back. This can be used independently of the blaze library: http://odo.readthedocs.org/en/latest/

answered Oct 18, 2015 at 19:38

Travis Oliphant

1,76514 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Need Framework to handle Interactions between Redshift and python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related