Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
59 views

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...
Roobal Jindal's user avatar
0 votes
1 answer
104 views

I have a pyspark script that reads data from S3 in a different AWS account, using AssumedRoleCredentialProvider , it is working on emr serverless 6.9 but when I upgrade to EMR Serverless 7.5 it fails ...
Sayed's user avatar
  • 11
0 votes
0 answers
43 views

I have since switched from EMR on EC2 to EMR serverless. I used to use interactive notebooks with EMR on EC2. I am trying to use the EMR studio workspace (notebooks) with EMR serverless application ...
DirtyDan's user avatar
0 votes
0 answers
70 views

I want to get heap dump of an executor of an emr-serverless application but that will be stored at exexcutor's local path. How can I extract it?
Roobal Jindal's user avatar
0 votes
0 answers
104 views

I'm working with AWS EMR Serverless, and I need to construct a job URL for an EMR Serverless job to be sent in a message notification in case of state change. The desired URL includes the associated ...
user27008283's user avatar
0 votes
0 answers
78 views

Unable to see live spark ui on aws emr serverless spark jobs. Once job is completed, UI is available but not avaialble for the running jobs Message: Live UI takes a few seconds to update due to its ...
Roobal Jindal's user avatar
1 vote
0 answers
75 views

I’m working with a large transaction dataset (~1 billion rows) in PySpark on AWS EMR. My goal is to perform feature engineering where I compute statistics like sum, mean, standard deviation, and ...
Meriiiiii's user avatar
0 votes
1 answer
117 views

I am trying to create Airflow DAG for EMR Serverless application creation. EMRServerlessCreateApplicationOperator( task_id = "create-emrs-app", job_type = "SPARK", ...
Prateek Pathak's user avatar
0 votes
1 answer
206 views

I have a Docker image that contains some application code that interacts with Spark. Is it possible to submit this image to a Spark cluster for execution? If so, how? # Not a real command $ aws emr-...
sdgfsdh's user avatar
  • 37.8k
-1 votes
1 answer
267 views

Does anyone know how to integrate DataDog with EMR Serverless? Any documentation will really be helpful.
Partha's user avatar
  • 580
1 vote
1 answer
637 views

When EMR cluster is created, there is a provision to provide bootstrap actions as shown below. aws emr create-cluster --name "Test cluster" --release-label emr-7.1.0 --use-default-roles --...
Partha's user avatar
  • 580
0 votes
0 answers
228 views

I am new to Spark and EMR on EKS. I am getting data from MongoDB collection. The job is running on local machine(without cluster mode) for both large and small datasets and is working for datasets ...
P.Subedi's user avatar
0 votes
1 answer
846 views

I'm trying to get my head around the TaskFlow API & XCom in airflow and am getting stuck, hoping someone here can help. I'm using EmrServerlessCreateApplicationOperator and I want to pass a value ...
jamiet's user avatar
  • 12.7k
1 vote
0 answers
369 views

I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will allow ...
Justine Paul Padayao's user avatar
0 votes
1 answer
358 views

I'm seeing an inexplicable array index reference error, Index 1 out of bounds for length 1 ... which I can't explain because I don't see any relevant arrays being referenced in my context of an AWS ...
SomeDude's user avatar
0 votes
1 answer
528 views

I was running a PySpark job (with Apache Hudi) on AWS EMR on EKS, the driver code was like: with (SparkSession.builder .appName(f"App") .config('spark.serializer', '...
Rinze's user avatar
  • 834
0 votes
1 answer
1k views

When I define an spark's executor's memory to 12gb, it actually allocates almost half of it like 6.7gb. Tried setting 20gb as well, then it allocates close to 11gb, half of it. I have defined ...
Roobal Jindal's user avatar
0 votes
1 answer
158 views

I have a question with EMR serverless. I want to create a script that reads data from S3 and then upload the data to a dynamodb table using EMR Serverless. And as a Normal EMR, I want to use this ...
Valle1208's user avatar
0 votes
0 answers
398 views

I have been working on a request that extracts data from an ORACLE 19c instance, and that is processed using aws emr-serveless using spark jdbc connections. The big picture is that I can't connect to ...
eduardollopes's user avatar
4 votes
0 answers
1k views

I am currently running Scala Spark applications on EMR serverless and all of the logs are getting output to stderr and logged at info level. Looking at this page it seems like this is the default for ...
Darragh.McL's user avatar
0 votes
1 answer
240 views

If I pass more than one subnet Id to EmrServerlessCreateApplicationOperator via the networkConfiguration attribute, I receive an error. If I use a single subnet Id the operator works fine. This is the ...
singleton's user avatar
  • 171
0 votes
1 answer
3k views

I would appreciate your help with my problem. I'm running a spark application on AWS EMR serverless with emr 6.11 release. I'm using Spark 3.3.2 with java 17, with configuration: maximum recourses of ...
Shai Barak's user avatar
1 vote
2 answers
2k views

I am trying to create a emr-serverless application through the EmrServerlessCreateApplicationOperator but I keep facing the error botocore.exceptions.NoRegionError: You must specify a region. I am ...
Hasham's user avatar
  • 43
0 votes
1 answer
2k views

In my PySpark project I'm using a python package that uses Dynaconf so I need to set the following environment variable - ENV_FOR_DYNACONF = platform. The problem is I don't understand how can I pass ...
nirkov's user avatar
  • 829
1 vote
1 answer
2k views

I'm new with EMR-serverless and I want to know how to pass, in a spark application, jar and packages as for example: spark-submit --deploy-mode client --jars /usr/lib/hudi/hudi-spark3.3-bundle_2.12-0....
Valle1208's user avatar
0 votes
3 answers
1k views

I am trying to run EMR Serverless job and upload logs into S3 with following logs configuration --configuration-overrides '{ "monitoringConfiguration": { "...
mgosk's user avatar
  • 1,876
1 vote
1 answer
3k views

Hi i am new to EMR serverless and trying to learn. I have a pyspark project which i want to run using EMR serverless. I tried using console but it is not letting me provide folder location as input. i ...
Corey A's user avatar
  • 11
1 vote
1 answer
2k views

I have an EMR Serverless PySpark job I am launching from a step function. I am trying to pass arguments to SparkSubmit from the entryPointArguments in the form of variables set in the beginning of the ...
george-ognyanov's user avatar
0 votes
1 answer
282 views

I am new to AWS and my account got hacked and in order to secure the account I have been advised to delete IAM roles. There is one role called AWSServiceRoleForAmazonEMRServerless that I am unable to ...
Naren Nallapareddy's user avatar
1 vote
1 answer
2k views

I am trying to install EMR serverless, for which i have two options Using Terraform script - which let me chose initial size, max memory etc. however i do not have an option to install jar files / ...
user20505247's user avatar
1 vote
1 answer
879 views

I want to launch a Spark job on EMR Serverless from Airflow. I want to use Spark 3.3.0 and Scala 2.13 but the 6.9.0 EMR Release ships with Scala 2.12. I created a FAT jar including all Spark ...
Oscar Drai's user avatar
0 votes
1 answer
357 views

I am using a S3 compatible object store (CloudFlare R2) and trying to get EMR serverless to connect to it. R2 requires that you use the correct endpoint and pass the secret key and access key. In the ...
Raghuveer's user avatar
  • 1,867
0 votes
2 answers
1k views

From boto3 doc for the start_job_run, it seems like I have to create job run every time I want to trigger a job. Does it really have to work that way? Can't I take the ID of the existing job, which ...
nirkov's user avatar
  • 829
0 votes
2 answers
6k views

**I'm trying to pass some arguments to run my pyspark script by the parameter of boto3 (emr-serverless client) EntryPointArguments, however, it doesn't work at all, I would like to know if I'm doing ...
Leoads99's user avatar
1 vote
0 answers
2k views

I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark. So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an ...
Muhammad Ashir Ali's user avatar
4 votes
1 answer
9k views

I have a Python project with several modules, classes, and dependencies files (a requirements.txt file). I want to pack it into one file with all the dependencies and give the file path to AWS EMR ...
nirkov's user avatar
  • 829
1 vote
2 answers
4k views

I have an EMR Serverless application that is getting stuck in executions timeouts for some reason. I have tested all s3 connections and it's working. The problem is happening during the execution of a ...
Renan Nogueira's user avatar
0 votes
1 answer
364 views

Summary of steps executed: Uploaded the python script to S3. Created a virtualenv that installs graphframes and uploaded it to S3. Added a VPC to my EMR application. Added graphframes package to ...
fredvultor's user avatar
1 vote
1 answer
4k views

I'm trying to run some jobs on aws cli using a virtual environment where I installed some libraries. I followed this guide; the same is here. But when I run the job I have this error: Job execution ...
solopiu's user avatar
  • 766
1 vote
1 answer
1k views

I'm using Pyspark SQL with regexp_extract in this way: df = spark.createDataFrame([['id_20_30', 10], ['id_40_50', 30]], ['id', 'age']) df.createOrReplaceTempView("table") sql_statement="...
solopiu's user avatar
  • 766
3 votes
0 answers
2k views

I'm trying to run a spark job using EMR Serverless but the issue is I cannot pass the list of jars and archives to the spark job. The spark properties section does not seem to allow passing in a comma ...
Philip K. Adetiloye's user avatar
2 votes
1 answer
2k views

I have an EMR serverless app that cannot connect to S3 bucket in another region. Is there a workaround for that? Maybe a parameter to set in Job parameters or Spark parameters when submitting a new ...
solopiu's user avatar
  • 766
0 votes
1 answer
444 views

Based on the documentation, Amazon EMR serverless seems to accepts only Spark and Hive as job driver. Is there any support for custom Hadoop jar for map reduce jobs on serverless similar to EMR ?
Avik Das's user avatar
  • 131
20 votes
2 answers
13k views

Recently, AWS announced Amazon EMR Serverless (Preview) https://aws.amazon.com/blogs/big-data/announcing-amazon-emr-serverless-preview-run-big-data-applications-without-managing-servers/ - new very ...
alexanoid's user avatar
  • 26.1k