Issue using Kafka with Spark using pyspark

Question

I am trying to consume data using Spark published by kafka, but I am unable to do so. I am using Spark 2.2.

I want to consume data sent by Kafka using Spark, process it and store in local file or HDFS.
I want to print out the data sent out by kafka (consumed by spark) in console after running spark job.

For Kafka, I am following this tutorial: https://kafka.apache.org/quickstart

    [cloudera@quickstart kafka]$ ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
    >message 1
    >message 2 
    >message 3
    >message 4

Run Spark python script file.py:

./spark/bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 file.py

Pyspark code:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("stream").getOrCreate()

df = spark\
.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers","localhost:9092")\
.option("subscribe","test")\
.load()

df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)", "topic")


#Trying to save result in a file
df.writeStream\
.format("text")\
.option("checkpointLocation", "file:///home/cloudera/file.txt")\
.option("path","file:///home/cloudera/file.txt")\
.start()
# Does not write to a file

#Trying to print result in console
df.writeStream()\
.outputMode("append")\
.format("console")\
.start()
# Does not print to console and gives error: TypeError: 'DataStreamWriter' object is not callable

Any help?

just to make sure, you started spark THEN produced data right? — moon
– moon, Commented Aug 23, 2017 at 6:16
@Falan yes I started kafka first. I want to know how to store data into HDFS from spark streaming. — Rio
– Rio, Commented Aug 25, 2017 at 20:12

skymook · Accepted Answer · 2017-11-06 17:26:36Z

1

The problem could well be this line:

df.writeStream()\

remove the () from the line like so:

df.writeStream\

edited Nov 6, 2017 at 17:26

skymook

3,7362 gold badges39 silver badges41 bronze badges

answered Nov 6, 2017 at 16:51

antoine

111 bronze badge

Sign up to request clarification or add additional context in comments.

1 Comment

Eric Ed Lohmar Over a year ago

Hello @antoine, and welcome to StackOverflow! Please take a minute to take the tour. Answers are more helpful when they explain what OP has done wrong and why. Please edit your answer to describe why your code will work instead of OP's.

Collectives™ on Stack Overflow

Issue using Kafka with Spark using pyspark

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related