AWS: Automating queries in redshift

Question

I want to automate a redshift insert query to be run every day.

We actually use Aws environment. I was told using lambda is not the right approach. Which is the best ETL process to automate a query in Redshift.

Can you provide more details about what the query is doing and how long it takes to run? Did they suggest why Lambda was not the right approach? — John Rotenstein
– John Rotenstein, Commented Sep 13, 2018 at 4:36
Have you looked into Amazon Quicksight for scheduled reports (queries from Redshift)? — John Hanley
– John Hanley, Commented Sep 13, 2018 at 5:39
@JohnRotenstein the functionality behind this query is to join few redshift tables like (stl_query, stl_session, stl_ddltext) and load into a custom created table and everyday this query needs to be run. The reason they said lambda is not the right approach is, it can be active only for 300 seconds and what if my query takes more than 5 mins to run. Pls advice. — Rrr
– Rrr, Commented Sep 13, 2018 at 11:49
Yes, the 5-minute limit is the important factor. If the query is likely to take longer than 5 minutes, Lambda is not an option. — John Rotenstein
– John Rotenstein, Commented Sep 13, 2018 at 11:54
@JohnRotenstein yea my query won’t take more than 5 mins, but worst if it takes more than 5 then this process will not be suitable. Pls advice. — Rrr
– Rrr, Commented Sep 13, 2018 at 11:55

Jon Scott · Accepted Answer · 2018-09-13 16:02:13Z

5

For automating SQL on Redshift you have 3 options (at least)

Simple - cron Use a EC2 instance and set up a cron job on that to run your SQL code.

psql -U youruser -p 5439 -h hostname_of_redshift -f your_sql_file

Feature rich - Airflow (Recommended) If you have a complex schedule to run then it is worth investing time learning and using apache airflow. This also needs to run on a server(ec2) but offers a lot of functionality.

https://airflow.apache.org/

AWS serverless - AWS data pipeline (NOT Recommended)

https://aws.amazon.com/datapipeline/

Cloudwatch->Lambda->EC2 method described below by John Rotenstein This is a good method when you want to be AWS centric, it will be cheaper than having a dedicated EC2 instance.

edited Sep 13, 2018 at 16:02

answered Sep 13, 2018 at 7:20

Jon Scott

4,36420 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rrr Over a year ago

Thanks Jon! Not a cron expert so have few questions.. we have a ec2 instance and we can have a cron Job in that. How will my cron job trigger the sql. Should I write a shell script. Pls explain.

John Rotenstein · Accepted Answer · 2018-09-13 11:58:32Z

2

One option:

Use Amazon CloudWatch Events on a schedule to trigger an AWS Lambda function
The Lambda function launches an EC2 instance with a User Data script. Configure Shutdown Behavior as Terminate.
The EC2 instance executes the User Data script
When the script is complete, it should call sudo shutdown now -h to shutdown and terminate the instance

The EC2 instance will only be billed per-second.

answered Sep 13, 2018 at 11:58

John Rotenstein

273k28 gold badges456 silver badges541 bronze badges

1 Comment

Jon Scott Over a year ago

That's also a valid option. I will add mention of it to my answer above. (I have not used that method myself yet)

Michael Warkentin · Accepted Answer · 2020-11-06 21:43:55Z

2

Redshift now supports scheduled queries natively: https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-schedule-query.html

answered Nov 6, 2020 at 21:43

Michael Warkentin

2,4331 gold badge15 silver badges10 bronze badges

1 Comment

Gabe Over a year ago

Note, this doesn't exist in V2 editor. Yet.

Rishabh Dixit · Accepted Answer · 2019-04-05 09:22:00Z

0

You can use boto3 and psycopg2 to run the queries by creating a python script and scheduling it in cron to be executed daily.

You can also try to convert your queries into Spark jobs and schedule those jobs to run in AWS Glue daily. If you find it difficult, you can also look into Spark SQL and give it a shot. If you are going with Spark SQL, keep in mind the memory usage as Spark SQL is pretty memory intensive.

answered Apr 5, 2019 at 9:22

Rishabh Dixit

1154 silver badges16 bronze badges

Collectives™ on Stack Overflow

AWS: Automating queries in redshift

4 Answers 4

1 Comment

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related