I am working in a script to load data to a iceberg table using AWS Glue/EMR (tried in both).
Error message:
pyspark.errors.exceptions.captured.AnalysisException: Cannot write into v1 table: spark_catalog.searc.sekey
For emr: While creating the cluster I have added config below.
[{
"Classification": "iceberg-defaults",
"Properties": {
"iceberg.enabled": "true"
}
}]
I am able to create the table and validated the table is present in Glue Catalog. However while loading data it throws error.
Code snippet below:
import sys
from pyspark.context import SparkContext, SparkConf
import pyspark.sql.functions as psf
from pyspark.sql import SparkSession
conf = SparkConf()
conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
conf.set("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
conf.set("spark.sql.catalog.glue_catalog.warehouse", "s3://seta-ssan/iceberg/")
conf.set("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
conf.set("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
conf.set("spark.sql.catalog.glue_catalog.glue.lakeformation-enabled", "true")
conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()
spark.sql(""" CREATE TABLE search_iceberg.seics (
mae_id string,
keyds string,
)
PARTITIONED BY ( dataset_date date )
LOCATION 's3://seaetrics'
TBLPROPERTIES (
'table_type'='iceberg',
'format'='PARQUET',
'write_compression'='snappy',
'format-version'='2'
)
""" )
df=spark.sql(''' select * from search_ekly where reg_id=5 limit 5 ''')
df.writeTo("search_iceb.ics").using("iceberg").option("write.format.default", "parquet").option("format-version", "2").overwritePartitions()

