Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
70 views

I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can ...
Vishwas Singh's user avatar
0 votes
0 answers
49 views

I have been using Spark v3.5 Spark Stream functionality for the below use case. I am observing the issue below on one of the environments with Spark Stream. Please if I can get some assistance with ...
Saurabh Agrawal's user avatar
0 votes
1 answer
142 views

I am trying to load the data written into the Kafka topic into the Postgres table. I can see the topic is receiving new messages every second and also the data looks good. However, when I use the ...
RushHour's user avatar
  • 645
0 votes
1 answer
122 views

ERROR SparkContext: Failed to add home/areaapache/software/spark-3.5.2-bin-hadoop3/jars/spark-streaming-kafka-0-10_2.13-3.5.2.jar \ to Spark environment import logging from pyspark.sql import ...
Lê Anh Tuấn 291N40's user avatar
3 votes
0 answers
79 views

I have Spark Streaming application lives on Argo + K8S that reads Kafka topics by subscribe pattern then there are some transformations and writing to a target. Several different producers may write ...
Александр Трутнев's user avatar
0 votes
1 answer
290 views

My goal is to run a Spark job using Databricks, and my challenge is that I can't store files in the local filesystem since the file is saved in the driver, but when my executors tried to access the ...
John Doe's user avatar
  • 433
1 vote
0 answers
384 views

I am working on spark streaming and reading data from kafka topic, but getting error java.lang.NoClassDefFoundError: org/apache/spark/kafka010/KafkaConfigUpdater. Running my code in K8s and provide ...
vivekdesai's user avatar
2 votes
0 answers
130 views

I have multiple topics in kafka that I need to sink in their respective delta table. A) 1 Streaming query for all topics If i use one streaming query, then the RDD/DF should contains data from ...
MaatDeamon's user avatar
  • 9,859
2 votes
1 answer
3k views

When I try to run this .py: import logging from cassandra.cluster import Cluster from pyspark.sql import SparkSession from pyspark.sql.functions import from_json, col from pyspark.sql.types import ...
francollado99's user avatar
2 votes
1 answer
548 views

I am in a bind here. I am trying to implement a very basic pipeline which reads data from kafka and process it in Spark. The problem I am facing is that apache spark shuts down abruptly giving the ...
Nanomachines Son's user avatar
1 vote
0 answers
97 views

I'm trying to read stream from Kafka using pyspark. The Stack I'm working with: Kubernetes. Stand alone spark cluster with 2 workers. spark-connect connected to the cluster and has the dependencies ...
waseemoo1's user avatar
0 votes
1 answer
167 views

I can't write to Kafka from Spark, Spark is reading but not writing, if I write to the console it doesn't give an error Traceback (most recent call last): File "f:\Sistema de Informação\TCC\...
Ingrid Iplinsky's user avatar
0 votes
1 answer
78 views

I have been trying to complete a project in which I needed to send data stream using kafka to local Spark to process the incoming data. However I can not show and use the data frame in the right ...
AFORS's user avatar
  • 11
0 votes
0 answers
33 views

Hello I am trying to use pyspark + kafka in order to do this I execute this command in order to set up the Spark application Spark version is 3.5.0 | spark-3.5.0-bin-hadoop3 Kafka version is - ...
Nícolas Farfán Cheneaux's user avatar
0 votes
0 answers
268 views

I'm trying to read data from kafka topic by using spark structured streaming on ec2(ubuntu) machine. If I try to read the data by using kafka stream only(kafka-console-consumer.sh) then there is no ...
Rushi's user avatar
  • 25
0 votes
1 answer
1k views

I have setup kafka 2.1 on windows and able to successfully communicate a topic from producer to consumer over localhost:9092. I now want to consume this in a spark structured stream. For this I setup ...
rarpal's user avatar
  • 193
0 votes
0 answers
279 views

according to the Kafka documentation offset in Kafka can be managed using enable.offset.commit and auto.commit.interval.ms. I have difficulties understanding the concept. For example I have a Kafka ...
AzUser1's user avatar
  • 193
0 votes
1 answer
3k views

My goal is to send/produce a txt from my Windows PC to a container running Kafka, to then be consume by pyspak (running in other container). I'm using docker-compose where I define a custom net and ...
yaviens's user avatar
  • 25
0 votes
1 answer
423 views

I am hoping to map a message to a object with schema and payload inside in during Spark structured streaming. This is my original code val input_schema = new StructType() .add("timestamp", ...
Hongbo Miao's user avatar
  • 50.7k
1 vote
1 answer
1k views

I am hoping to read a parquet file and write to Kafka import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions.struct import org.apache.spark.sql.functions.to_json object ...
Hongbo Miao's user avatar
  • 50.7k
0 votes
1 answer
573 views

I am running following code to read kafka stream with spark-3.2.2, and scala 2.12.0. Earlier same code was working fine with spark-2.2 and scala 2.11.8, import spark.implicits._ val kafkaStream = ...
Chandan Gawri's user avatar
0 votes
0 answers
176 views

How to make the kafka stream more stable? as it it will run constantly without having us to start the run again after it fails (so far we are thinking about using the "continous" run mode to ...
Trodenn's user avatar
  • 37
2 votes
1 answer
191 views

I have more data in a kafka topic but when i extract data using my pyspark application (which I use to extract from different kafka topics), I am getting only 1 row extracted. Previously I had ...
rakk's user avatar
  • 57
0 votes
0 answers
99 views

I have a new dataframe with 2 columns one is headers and other is a payload, facing issues in reading the headers column and assigning the values to kafka headers while publishing. earlier the ...
abhimanyu's user avatar
  • 123
1 vote
0 answers
396 views

I have a spark structured streaming job in scala, reading from kafka and writing to S3 as hudi tables. Now I am trying to move this job to spark operator on EKS. When I give the option in the yaml ...
haripriya rajendran's user avatar
0 votes
1 answer
74 views

I have a list of topics (for now it's 10) whose size can increase in future. I know we can spawn multiple threads (per topic) to consume from each topic, but in my case if the number of topics ...
erك's user avatar
  • 3
0 votes
0 answers
1k views

I am using this tech stack: Spark version: 3.3.1 Scala Version: 2.12.15 Hadoop Version: 3.3.4 Kafka Version: 3.3.1 I am trying to get data from kafka topic through spark structure streaming, But I am ...
Muhammad Affan's user avatar
0 votes
1 answer
1k views

I am using Spark2.3.0 and kafka1.0.0.3. I have created a spark read stream df = spark.readStream. \ format("kafka"). \ option("kafka.bootstrap.servers", "...
Utpal Dutta's user avatar
1 vote
1 answer
785 views

I am Running spark in cluster mode which is giving error as ERROR SslEngineBuilder: Modification time of key store could not be obtained: hdfs://ip:port/user/hadoop/jks/kafka.client.truststore.jks ...
richa bharwal's user avatar
0 votes
1 answer
2k views

I'm able to stream twitter data into my Kafka topic via a producer. When I try to consume through the default Kafka consumer I'm able to see the tweets as well. But when I try to use Spark Streaming ...
Kulasangar's user avatar
  • 9,532
1 vote
0 answers
222 views

I have a simple pyspark code,which reads data from kafka and write aggregated records to oracle with foreachbatch.I have set checkpointLocation on hdfs dir and it works well.when I kill application ...
Nestor's user avatar
  • 91
1 vote
0 answers
75 views

22/11/09 11:08:40 INFO MicroBatchExecution: Resuming at batch 206 with committed offsets {KafkaV2[Subscribe[test]]: {"test":{"0":3086,"1":3086,"2":3086,"3&...
qtcat's user avatar
  • 25
1 vote
0 answers
358 views

I'm trying to determine the schema of a json kafka topic. To achieve that, I lifted a code part from this blog(https://medium.com/wehkamp-techblog/streaming-kafka-topic-to-delta-table-s3-with-spark-...
Glimpse's user avatar
  • 23
0 votes
1 answer
346 views

I am trying to write data to Kafka with the writestream method. I have been given the following properties by the source system. topic = 'mytopic' host = "myhost.us-west-1.aws.confluent.cloud:...
OrganicMustard's user avatar
3 votes
0 answers
576 views

i've a Structured Streaming code which reads data from a Kafka Topic (on a VM) & writes to another Kafka Topic on GKE (i should be using a Mirror Maker for this, but have not implemented that yet)....
Karan Alang's user avatar
  • 1,111
1 vote
2 answers
496 views

We subscribed to 7 topics with spark.readStream in 1 single running spark app. After transforming the event payloads, we save them with spark.writeStream to our database. For one of the topics, the ...
nikitira's user avatar
0 votes
1 answer
715 views

I installed .NET for Apache Spark using the following guide: https://learn.microsoft.com/en-us/dotnet/spark/tutorials/get-started?WT.mc_id=dotnet-35129-website&tabs=windows The Hello World worked. ...
Kenci's user avatar
  • 4,902
0 votes
1 answer
247 views

That is my code. import findspark findspark.init() import os os.environ[ "PYSPARK_SUBMIT_ARGS" ] = "--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1 pyspark-shell" ...
datadibi's user avatar
0 votes
0 answers
262 views

I'm trying to read data from GCP kafka through azure databricks but getting below warning and notebook is simply not completing. Any suggestion please?   WARN NetworkClient: Consumer groupId Bootstrap ...
Syed Mohammed Mehdi's user avatar
2 votes
1 answer
2k views

I am encountering problem with printing the data to console from kafka topic. The error message I get is shown in below image. 22/09/06 10:14:02 ERROR MicroBatchExecution: Query [id = ba6cb0ca-a3b1-...
Ulrich Tedongmo Douanla's user avatar
1 vote
0 answers
48 views

I don't want to use one consumer for all topics, I want to use this method to improve consumption efficiency val kafkaParams = Map( ConsumerConfig.GROUP_ID_CONFIG -> group, ...
1580923067's user avatar
3 votes
3 answers
3k views

I am using Spark Structured Streaming to read messages from multiple topics in kafka. I am facing below error: java.lang.NoSuchMethodError: org.apache.spark.sql.kafka010.consumer....
Buzz97's user avatar
  • 53
1 vote
0 answers
112 views

I am using spark structured streaming to read data from Kafka and apply some udf to the dataset. The code as below : calludf = F.udf(lambda x: function_name(x)) dfraw = spark.readStream.format('kafka'...
Taimoor Abbasi's user avatar
0 votes
2 answers
773 views

I'm running Spark version 2.3.0.2.6.5.1175-1 with Python 3. 6.8 on Ambari. While submitting the application I get the following logs in stderr 22/06/15 12:29:31 INFO StateStoreCoordinatorRef: ...
Taimoor Abbasi's user avatar
1 vote
2 answers
8k views

pyspark version - 2.4.7 kafka version - 2.13_3.2.0 Hi, I am new to pyspark and streaming properties. I have come across few resources in the internet, but still I am not able to figure out how to send ...
subh's user avatar
  • 27
0 votes
1 answer
357 views

I've a simple Java Spark script. That basically it's to return kafka data: import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.functions; import org.apache.spark.sql.Dataset; import ...
Pedro Alves's user avatar
  • 1,054
0 votes
1 answer
305 views

I am doing a structured streaming using SparkSession.readStream and writing it to hive table, but seems it does not allow me to time-based micro-batches, i.e. I need a batch of 5 secs. All the ...
aiman's user avatar
  • 1,115
0 votes
1 answer
169 views

I have a homework like this: Use python to read json files in 2 folders song_data and log_data. Use Python Kafka to publish a mixture of both song_data and log_data file types into a Kafka topic. Use ...
JustStartlDev's user avatar
1 vote
2 answers
2k views

I'm using one of the Docker images of EMR on EKS (emr-6.5.0:20211119) and investigating how to work on Kafka with Spark Structured Programming (pyspark). As per the integration guide, I run a Python ...
Jaehyeon Kim's user avatar
  • 1,417
0 votes
1 answer
2k views

I am attempting to read from a kafka topic, aggregate some data over a tumbling window and write that to a sink (I've been trying both with Kafka and console). The problems I'm seeing are a long ...
Barnesly's user avatar