0

I have a Lake formation resource link database table, from another AWS account, of which I can query in Athena just find with permissions. But I cannot query this data in EMR. The permission access does not seem to get passed down into pyspark for some reason. I added my EMR service and instance IAM roles as Lake Formation Administrators just to bypass any Lake formation permissions I am missing.

This resource link is also an iceberg table, not sure if that changes things. This is my current spark configuration.

{
  "Classification": "spark-defaults",
  "Properties": {
    "spark.sql.catalog.aws_glue": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.aws_glue.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
    "spark.sql.catalog.aws_glue.glue.lakeformation.enabled": "true",
    "spark.sql.catalog.aws_glue.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "spark.sql.catalog.aws_glue.lakeformation-enabled": "true",
    "spark.sql.defaultCatalog": "aws_glue"
  }
}

If I list the tables for my catalog

# List tables first to verify access
logger.info("Verifying table access...")
tables = spark_session.sql(f"SHOW TABLES FROM {catalog_name}.{db_name}").collect()
logger.info(f"Available tables: {[t.tableName for t in tables]}")

I can see my tables in the logs

 Verifying table access...
2025-01-16 05:42:06 INFO     Available tables: ['account', 'activitydefinition',...

I have tried a couple things

# First try a simple count to verify access
logger.info("\nAttempting count query")
try:
    count_df = spark_session.sql(f"""
        SELECT *
        FROM {catalog_name}.{db_name}.{table_name}
    """)
    count_df.show()
except Exception as e:
    logger.error(f"Count query failed: {str(e)}")
# Try reading with minimal options
logger.info("\nAttempting main query")
try:
    df = (
        spark_session.read.format("iceberg")
        .option("lakeformation-enabled", "true")
        .option("read-identity-based-auth", "true")
        .table(f"{db_name}.{table_name}")
        .select("id", "identifier")
    )
    logger.info("Successfully created DataFrame")
    df.printSchema()
    return df
except Exception as e:
    logger.error(f"Main query failed: {str(e)}")
    # One final attempt with SQL
    logger.info("\nTrying final SQL approach")
    df = spark_session.sql(f"""
        SELECT t.*
        FROM {catalog_name}.{db_name}.{table_name} t
    """)
    return df

But it is always the same error.

Failed to query data: An error occurred while calling o149.sql. : software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: FRKVTCMCWA771WS7, Extended Request ID: rH0oJbyJm6IBsmZCMDlOZzbjh5hxBE5oU31zXxnxolomK4a+c4txq7iTV4I7WDsgC32qXMnEAUw=) at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125) at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82) at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60) at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41) at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:50) at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:38) at

1 Answer 1

0

check if both the EMR Service Role and the EC2 Instance Profile Role have permissions to access the S3 bucket storing the data. This might help.

Sign up to request clarification or add additional context in comments.

2 Comments

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
This is done via a lake formation resource link I am not able to modify permissions on the s3 bucket with a bucket policy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.