Use Case: Automates live Chicago traffic data and flows it into Bigquery for interactive real-time analysis
Technical Concept: Schedules a Python script to append data into Bigquery using Google Cloud's App Engine with a cron job.
Reference: http://zablo.net/blog/post/python-apache-beam-google-dataflow-cron
Shout out to Mylin Ackerman for all his help. Saved me weeks of research with his personal touch. https://www.linkedin.com/in/mylin-ackermann-25a00445/
Order of Operations:
- Develop scripts with Google cloud shell or SDK
- Deploy on appengine
- Deploy cron job
- Check BigQuery
- Connect with dataviz tool such as Tableau
Setup Prerequisites:
Development Instructions:
- Copy github repository into SDK or Google cloud shell(thankfully it has persistent storage, so you don't have to recopy the folder structure)
Deploy Instructions:
- Install all required packages into local lib folder: pip install -r requirements.txt -t lib
- To deploy App Engine app, run: gcloud app deploy app.yaml
- To deploy App Engine CRON, run: gcloud app deploy cron.yaml
Document Context:
- app.yaml contains definition of App Engine app, which will spawn Dataflow pipeline
- cron.yaml contains definition of App Engine CRON, which will ping one of the App endpoints (in order to spawn Dataflow pipeline)
- appengine_config.py adds dependencies to locally installed packages (from lib folder)