Newest 'emr-serverless' Questions

0 votes

1 answer

59 views

Unable to register database/table in aws glue when hudi job is submitted from emrserverless

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...

Roobal Jindal

294

asked Jul 9 at 7:00

0 votes

1 answer

104 views

Can not read from S3 with AssumedRoleCredentialProvider after upgrade from EMR serverless 6.9 to 7.5

I have a pyspark script that reads data from S3 in a different AWS account, using AssumedRoleCredentialProvider , it is working on emr serverless 6.9 but when I upgrade to EMR Serverless 7.5 it fails ...

Sayed

11

asked Jun 14 at 16:00

0 votes

0 answers

43 views

Setting JAR in EMR workspace using EMR Serverless application

I have since switched from EMR on EC2 to EMR serverless. I used to use interactive notebooks with EMR on EC2. I am trying to use the EMR studio workspace (notebooks) with EMR serverless application ...

DirtyDan

1

asked Feb 20 at 17:52

0 votes

0 answers

70 views

How to take and extract heap dump from an spark executor of aws emr serverless application?

I want to get heap dump of an executor of an emr-serverless application but that will be stored at exexcutor's local path. How can I extract it?

Roobal Jindal

294

asked Jan 21 at 12:57

0 votes

0 answers

104 views

How to get the EMR Serverless Job URL with EMR Studio Information Missing from the Event?

I'm working with AWS EMR Serverless, and I need to construct a job URL for an EMR Serverless job to be sent in a message notification in case of state change. The desired URL includes the associated ...

user27008283

35

asked Jan 12 at 13:28

0 votes

0 answers

78 views

AWS Emr serverless spark: Live UI takes a few seconds to update due to its asynchronous nature. Please check again in a few seconds

Unable to see live spark ui on aws emr serverless spark jobs. Once job is completed, UI is available but not avaialble for the running jobs Message: Live UI takes a few seconds to update due to its ...

Roobal Jindal

294

asked Dec 24, 2024 at 9:19

1 vote

0 answers

75 views

Optimizing PySpark Feature Engineering with Over a Billion Rows on EMR

I’m working with a large transaction dataset (~1 billion rows) in PySpark on AWS EMR. My goal is to perform feature engineering where I compute statistics like sum, mean, standard deviation, and ...

Meriiiiii

11

asked Sep 21, 2024 at 21:36

0 votes

1 answer

117 views

Airflow EMRServerlessCreateApplicationOpertor can't detect application name from airflow input parameter using jinja template

I am trying to create Airflow DAG for EMR Serverless application creation. EMRServerlessCreateApplicationOperator( task_id = "create-emrs-app", job_type = "SPARK", ...

Prateek Pathak

121

asked Sep 14, 2024 at 3:49

0 votes

1 answer

206 views

Is it possible to submit a Docker image as a Spark job to EMR Serverless?

I have a Docker image that contains some application code that interacts with Spark. Is it possible to submit this image to a Spark cluster for execution? If so, how? # Not a real command $ aws emr-...

sdgfsdh

37.8k

asked Aug 17, 2024 at 17:35

-1 votes

1 answer

267 views

DataDog integration with EMR Serverless

Does anyone know how to integrate DataDog with EMR Serverless? Any documentation will really be helpful.

Partha

580

asked Jun 6, 2024 at 6:46

1 vote

1 answer

637 views

Does EMR Serverless support Bootstrap action

When EMR cluster is created, there is a provision to provide bootstrap actions as shown below. aws emr create-cluster --name "Test cluster" --release-label emr-7.1.0 --use-default-roles --...

Partha

580

asked Jun 6, 2024 at 6:44

0 votes

0 answers

228 views

My Spark job on EMR is stuck for large datasets. The driver is waiting for executer signal but executer has completed its task & waiting for next task

I am new to Spark and EMR on EKS. I am getting data from MongoDB collection. The job is running on local machine(without cluster mode) for both large and small datasets and is working for datasets ...

P.Subedi

1

asked Apr 10, 2024 at 4:48

0 votes

1 answer

846 views

How do I pass xcom to traditional operator from task created using @task decorator?

I'm trying to get my head around the TaskFlow API & XCom in airflow and am getting stuck, hoping someone here can help. I'm using EmrServerlessCreateApplicationOperator and I want to pass a value ...

jamiet

12.7k

asked Apr 8, 2024 at 15:13

1 vote

0 answers

369 views

Import Custom Python Modules on EMR Serverless through Spark Configuration

I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will allow ...

Justine Paul Padayao

11

asked Mar 12, 2024 at 17:33

0 votes

1 answer

358 views

Inexplicable PySpark SQL array indexing error: Index 1 out of bounds for length 1

I'm seeing an inexplicable array index reference error, Index 1 out of bounds for length 1 ... which I can't explain because I don't see any relevant arrays being referenced in my context of an AWS ...

SomeDude

31

asked Mar 4, 2024 at 21:11

0 votes

1 answer

528 views

EMR task keep in RUNNING state even after the Spark job has finished

I was running a PySpark job (with Apache Hudi) on AWS EMR on EKS, the driver code was like: with (SparkSession.builder .appName(f"App") .config('spark.serializer', '...

Rinze

834

asked Feb 4, 2024 at 9:42

0 votes

1 answer

1k views

EMRserverless is allocating half of the memory to the executors than what we actually define in spark jobs

When I define an spark's executor's memory to 12gb, it actually allocates almost half of it like 6.7gb. Tried setting 20gb as well, then it allocates close to 11gb, half of it. I have defined ...

Roobal Jindal

294

asked Dec 14, 2023 at 18:55

0 votes

1 answer

158 views

AWS Dynamodb package problem - EMR Serverless

I have a question with EMR serverless. I want to create a script that reads data from S3 and then upload the data to a dynamodb table using EMR Serverless. And as a Normal EMR, I want to use this ...

Valle1208

43

asked Nov 29, 2023 at 18:21

0 votes

0 answers

398 views

(spark jdbc) SQLRecoverableException: I/O Exception: Connection reset

I have been working on a request that extracts data from an ORACLE 19c instance, and that is processed using aws emr-serveless using spark jdbc connections. The big picture is that I can't connect to ...

eduardollopes

1

asked Oct 26, 2023 at 16:25

4 votes

0 answers

1k views

How to configure EMR Serverless to log spark applications correctly to stdout and stderr

I am currently running Scala Spark applications on EMR serverless and all of the logs are getting output to stderr and logged at info level. Looking at this page it seems like this is the default for ...

Darragh.McL

117

asked Aug 31, 2023 at 17:03

0 votes

1 answer

240 views

EmrServerlessCreateApplicationOperator networkConfiguration with multiple subnetIds

If I pass more than one subnet Id to EmrServerlessCreateApplicationOperator via the networkConfiguration attribute, I receive an error. If I use a single subnet Id the operator works fine. This is the ...

singleton

171

asked Jul 12, 2023 at 7:46

0 votes

1 answer

3k views

Executors not seem to be created or scaling up on Spark Application on AWS EMR Serverless

I would appreciate your help with my problem. I'm running a spark application on AWS EMR serverless with emr 6.11 release. I'm using Spark 3.3.2 with java 17, with configuration: maximum recourses of ...

Shai Barak

129

asked Jul 6, 2023 at 11:52

1 vote

2 answers

2k views

botocore.exceptions.NoRegionError: You must specify a region for EmrServerlessCreateApplicationOperator

I am trying to create a emr-serverless application through the EmrServerlessCreateApplicationOperator but I keep facing the error botocore.exceptions.NoRegionError: You must specify a region. I am ...

Hasham

43

asked Jun 25, 2023 at 4:38

0 votes

1 answer

2k views

How can I pass environment variable to project which run on EMR Serverless?

In my PySpark project I'm using a python package that uses Dynaconf so I need to set the following environment variable - ENV_FOR_DYNACONF = platform. The problem is I don't understand how can I pass ...

nirkov

829

asked Apr 23, 2023 at 14:28

1 vote

1 answer

2k views

EMR serverless- Pass jars in console

I'm new with EMR-serverless and I want to know how to pass, in a spark application, jar and packages as for example: spark-submit --deploy-mode client --jars /usr/lib/hudi/hudi-spark3.3-bundle_2.12-0....

Valle1208

43

asked Apr 19, 2023 at 22:51

0 votes

3 answers

1k views

EMR serverless - S3 access denied for logs

I am trying to run EMR Serverless job and upload logs into S3 with following logs configuration --configuration-overrides '{ "monitoringConfiguration": { "...

mgosk

1,876

asked Mar 22, 2023 at 13:01

1 vote

1 answer

3k views

AWS EMR serverless - how to submit pyspark jobs (using console) with multiple files?

Hi i am new to EMR serverless and trying to learn. I have a pyspark project which i want to run using EMR serverless. I tried using console but it is not letting me provide folder location as input. i ...

Corey A

11

asked Mar 21, 2023 at 17:10

1 vote

1 answer

2k views

How to pass EMR Serverless PySpark entryPointArguments as variable

I have an EMR Serverless PySpark job I am launching from a step function. I am trying to pass arguments to SparkSubmit from the entryPointArguments in the form of variables set in the beginning of the ...

george-ognyanov

53

asked Feb 26, 2023 at 13:02

0 votes

1 answer

282 views

How to delete AWSServiceRoleForAmazonEMRServerless?

I am new to AWS and my account got hacked and in order to secure the account I have been advised to delete IAM roles. There is one role called AWSServiceRoleForAmazonEMRServerless that I am unable to ...

Naren Nallapareddy

3

asked Feb 22, 2023 at 22:08

1 vote

1 answer

2k views

EMR serverless using Docker- how to install JAR files

I am trying to install EMR serverless, for which i have two options Using Terraform script - which let me chose initial size, max memory etc. however i do not have an option to install jar files / ...

user20505247

11

asked Feb 16, 2023 at 15:43

1 vote

1 answer

879 views

EMR Serverless Airflow Operator not allowing EMR custom images

I want to launch a Spark job on EMR Serverless from Airflow. I want to use Spark 3.3.0 and Scala 2.13 but the 6.9.0 EMR Release ships with Scala 2.12. I created a FAT jar including all Spark ...

Oscar Drai

181

asked Feb 9, 2023 at 19:29

0 votes

1 answer

357 views

Using S3A client in EMR serverless

I am using a S3 compatible object store (CloudFlare R2) and trying to get EMR serverless to connect to it. R2 requires that you use the correct endpoint and pass the secret key and access key. In the ...

Raghuveer

1,867

asked Jan 30, 2023 at 23:03

0 votes

2 answers

1k views

How to run existing EMR serverless job with boto3?

From boto3 doc for the start_job_run, it seems like I have to create job run every time I want to trigger a job. Does it really have to work that way? Can't I take the ID of the existing job, which ...

nirkov

829

asked Dec 14, 2022 at 13:49

0 votes

2 answers

6k views

How to Pass Arguments (EntryPointArguments) in spark JOB using EMR Serverless?

**I'm trying to pass some arguments to run my pyspark script by the parameter of boto3 (emr-serverless client) EntryPointArguments, however, it doesn't work at all, I would like to know if I'm doing ...

Leoads99

13

asked Nov 24, 2022 at 14:23

1 vote

0 answers

2k views

AWS EMR serverless connect to jdbc SQL Server

I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark. So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an ...

Muhammad Ashir Ali

65

asked Nov 3, 2022 at 22:35

4 votes

1 answer

9k views

How to run a Python project (package) on AWS EMR serverless?

I have a Python project with several modules, classes, and dependencies files (a requirements.txt file). I want to pack it into one file with all the dependencies and give the file path to AWS EMR ...

nirkov

829

asked Oct 25, 2022 at 11:33

1 vote

2 answers

4k views

EMR Serverless Spark Executors Timeout

I have an EMR Serverless application that is getting stuck in executions timeouts for some reason. I have tested all s3 connections and it's working. The problem is happening during the execution of a ...

Renan Nogueira

165

asked Sep 28, 2022 at 13:21

0 votes

1 answer

364 views

How to use GraphFrames on EMR serverless

Summary of steps executed: Uploaded the python script to S3. Created a virtualenv that installs graphframes and uploaded it to S3. Added a VPC to my EMR application. Added graphframes package to ...

fredvultor

21

asked Sep 12, 2022 at 15:56

1 vote

1 answer

4k views

Virtualenv in aws emr-serverless

I'm trying to run some jobs on aws cli using a virtual environment where I installed some libraries. I followed this guide; the same is here. But when I run the job I have this error: Job execution ...

solopiu

766

asked Jul 18, 2022 at 12:12

1 vote

1 answer

1k views

regexp extract pyspark sql: ParseException Literals of type 'R' are currently not supported

I'm using Pyspark SQL with regexp_extract in this way: df = spark.createDataFrame([['id_20_30', 10], ['id_40_50', 30]], ['id', 'age']) df.createOrReplaceTempView("table") sql_statement="...

solopiu

766

asked Jul 15, 2022 at 11:11

3 votes

0 answers

2k views

AWS EMR Serverless spark properties delimter

I'm trying to run a spark job using EMR Serverless but the issue is I cannot pass the list of jars and archives to the spark job. The spark properties section does not seem to allow passing in a comma ...

Philip K. Adetiloye

3,270

asked Jul 14, 2022 at 1:57

2 votes

1 answer

2k views

EMR serverless cannot connect to s3 in another region

I have an EMR serverless app that cannot connect to S3 bucket in another region. Is there a workaround for that? Maybe a parameter to set in Job parameters or Spark parameters when submitting a new ...

solopiu

766

asked Jul 6, 2022 at 9:20

0 votes

1 answer

444 views

How to run the map reduce jobs on EMRserverless?

Based on the documentation, Amazon EMR serverless seems to accepts only Spark and Hive as job driver. Is there any support for custom Hadoop jar for map reduce jobs on serverless similar to EMR ?

Avik Das

131

asked Jun 17, 2022 at 21:43

20 votes

2 answers

13k views

AWS Glue vs EMR Serverless

Recently, AWS announced Amazon EMR Serverless (Preview) https://aws.amazon.com/blogs/big-data/announcing-amazon-emr-serverless-preview-run-big-data-applications-without-managing-servers/ - new very ...

alexanoid

26.1k

asked Dec 12, 2021 at 8:10

Collectives™ on Stack Overflow