So I'm creating and using a SparkSession on Amazon EMR as follows:
os.environ["AWS_ACCESS_KEY_ID"] = access_key_id
os.environ["AWS_SECRET_ACCESS_KEY"] = secret_access_key
os.environ["AWS_SESSION_TOKEN"] = session_token
spark_builder = pyspark.sql.SparkSession \
.builder \
.config("spark.hadoop.fs.s3a.aws.credentials.provider",
"com.amazonaws.auth.DefaultAWSCredentialsProviderChain") \
.config("spark.driver.extraClassPath",
"/home/app/lib/hadoop-aws-2.8.4.jar:"
"/home/app/aws_java_sdk/*") \
.config("spark.executor.extraClassPath",
"/home/app/lib/hadoop-aws-2.8.4.jar:"
"/home/app/aws_java_sdk/*") \
.config("spark.sql.warehouse.dir", metastore_dir) \
.config("spark.master", "local[*]") \
.appName("pyspark explain basic example") \
.enableHiveSupport()
spark_builder = spark_builder \
.config("spark.hadoop.fs.s3a.access.key",
access_key_id) \
.config("spark.hadoop.fs.s3a.secret.key",
secret_access_key) \
.config("spark.hadoop.fs.s3a.session.token",
session_token) \
.config("aws.accessKeyId",
access_key_id) \
.config("aws.secretAccessKey",
secret_access_key) \
.config("aws.sessionToken",
session_token)
spark = spark_builder.getOrCreate()
del os.environ["AWS_ACCESS_KEY_ID"]
del os.environ["AWS_SECRET_ACCESS_KEY"]
del os.environ["AWS_SESSION_TOKEN"]
This works for, say, accessing a given Amazon s3 location.
I now want to change the AWS credentials within this spark session to access a different s3 location. What's the best way to go about this?
- Looking at SparkSession pages, I don't see an obvious way to update the runtime configuration. There's
newSessionmethod but that doesn't accept anyconfparameter (or any parameter, for that matter). - While writing this, I came across this SO thread which suggests stopping and starting spark session since
SparkSessionandSparkContextare not modifiable after start.
New stackoverflow poster, please ignore any etiquette misses :)