Structured Streaming with Apache Spark coded in Spark.SQL

Question

Streaming transformations in Apache Spark with Databricks is usually coded in either Scala or Python. However, can someone let me know if it's also possible to code Streaming in SQL on Delta?

For example for the following sample code uses PySpark for structured streaming, can you let me know what would be the equivalent in spark.SQL

simpleTransform = streaming.withColumn(" stairs", expr(" gt like '% stairs%'"))\ 
.where(" stairs")\ 
.where(" gt is not null")\ 
.select(" gt", "model", "arrival_time", "creation_time")\ 
.writeStream\ 
.queryName(" simple_transform")\ 
.format(" memory")\ 
.outputMode("update")\ 
.start()

Alex Ott · Accepted Answer · 2022-01-05 14:25:53Z

2

You can just register that streaming DF as a temporary view, and perform queries on it. For example (using rate source just for simplicity):

df=spark.readStream.format("rate").load()
df.createOrReplaceTempView("my_stream")

then you can just perform SQL queries directly on that view, like, select * from my_stream:

Or you can create another view, applying whatever transformations you need. For example, we can select only every 5th value if we use this SQL statement:

create or replace temp view my_derived as 
select * from my_stream where (value % 5) == 0

and then query that view with select * from my_derived:

answered Jan 5, 2022 at 14:25

Alex Ott

88.1k10 gold badges110 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Patterson Over a year ago

Perfect, thanks Alex Ott

Collectives™ on Stack Overflow

Structured Streaming with Apache Spark coded in Spark.SQL

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related