Config params are not propagated when using Spark Connect

Ask Question

Asked 12 months ago

Modified 12 months ago

Viewed 108 times

Part of AWS Collective

I am trying to get Spark Connect working on Amazon EMR (Spark v3.5.1). I started the Connect server on EMR primary node, making sure the JARs required for S3 auth are present in the Classpath:

/usr/lib/spark/sbin/start-connect-server.sh \
  --conf spark.jars=/usr/lib/hadoop/hadoop-aws.jar,/usr/share/aws/aws-java-sdk/*,/usr/share/aws/aws-java-sdk-v2/* \
  --packages org.apache.spark:spark-connect_2.12:3.5.1

In our EMR setup, the instance profile role has limited access. Instead, we require our users to assume a use-case specific role which has access to use-case specific resources, before they can access any data in the Spark jobs. It is also why we don't use EMRFS - if we configure EMRFS authorization configuration to automatically assume a role based on S3 prefixes, the developers can unknowingly combine data from two different use-cases which violates our security principles.

Normally, when not using Spark Connect, we set the STS credentials using the SparkContext. Since SparkContext is not available when using Spark Connect, I am setting the credentials like this:

spark = (
   SparkSession.builder.appName("SparkConnectTest").remote("sc://localhost:21100")
   .config("spark.hadoop.fs.s3a.access.key", sts_creds["AccessKeyId"])
   .config("spark.hadoop.fs.s3a.secret.key", sts_creds["SecretAccessKey"])
   .config("spark.hadoop.fs.s3a.session.token", sts_creds["SessionToken"])
   .config("spark.hadoop.fs.s3a.endpoint", "")
   .config("spark.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
   .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
   .getOrCreate()
)

However, it seems these config params are not propagated to the server at all, and I get the AccessDeniedException when I try to access S3 resources.

SparkConnectGrpcException: (java.nio.file.AccessDeniedException) s3a://<s3-path>: getFileStatus on <s3-path>: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 403, Request ID: XXXXX, Extended Request ID: XXXXXXXXXXXX):null

I have verified the credentials are fine. In fact, if I set these same credentials on the server by using the --conf arguments to start-connect-server.sh, everything works as expected, and I am able to access the S3 resources on my client. Obviously, that's not a solution since STS creds are temporary, and we want the developers to provide use-case specific credentials from the client.

So the problem seems to be the config parameters not being propagated when creating a Spark Connect session.

edited Nov 29, 2024 at 19:34

asked Nov 29, 2024 at 12:02

Ninad

811 gold badge2 silver badges6 bronze badges

did you find the solution for this? I am exactly facing the same issue as config params are not getting propagate and I can set the credentials at server side and it will be used in multitenant env.

Jagdish Gadhiya
– Jagdish Gadhiya

2025-08-14 12:13:10 +00:00
Commented Aug 14 at 12:13

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Config params are not propagated when using Spark Connect

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest