2

Streaming transformations in Apache Spark with Databricks is usually coded in either Scala or Python. However, can someone let me know if it's also possible to code Streaming in SQL on Delta?

For example for the following sample code uses PySpark for structured streaming, can you let me know what would be the equivalent in spark.SQL

simpleTransform = streaming.withColumn(" stairs", expr(" gt like '% stairs%'"))\ 
.where(" stairs")\ 
.where(" gt is not null")\ 
.select(" gt", "model", "arrival_time", "creation_time")\ 
.writeStream\ 
.queryName(" simple_transform")\ 
.format(" memory")\ 
.outputMode("update")\ 
.start()

1 Answer 1

2

You can just register that streaming DF as a temporary view, and perform queries on it. For example (using rate source just for simplicity):

df=spark.readStream.format("rate").load()
df.createOrReplaceTempView("my_stream")

then you can just perform SQL queries directly on that view, like, select * from my_stream:

enter image description here

Or you can create another view, applying whatever transformations you need. For example, we can select only every 5th value if we use this SQL statement:

create or replace temp view my_derived as 
select * from my_stream where (value % 5) == 0

and then query that view with select * from my_derived:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Perfect, thanks Alex Ott

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.