0

I am writing data into s3 and table format is Iceberg in Glue Catalog. I see the /data and /metadata folders are getting created. However when I am writing data, it's creating 001/002 kind of folders. Is there any way I can keep the table/partition

Here is the code and folder strcture.

result_df.writeTo(catalog_table_name) \
                .tableProperty("write.format.default", "parquet") \
                .tableProperty("write.distribution-mode", "hash") \
                .tableProperty("format-version", "2") \
                .tableProperty("write.merge.mode", "copy-on-write") \
                .tableProperty("write.object-storage.path-style.enabled", "true")\
                .partitionedBy("srce", "regon", "coury", "datte") \
                .overwritePartitions()

enter image description here

1 Answer 1

0

Why do you want to keep it like table/partition? In Iceberg table format, the folder structure doesn't matter.

The write config write.object-storage.enabled is responsible for adding hash to file paths. See: https://iceberg.apache.org/docs/latest/configuration/#write-properties. It is actually helpful in S3 as S3 throttles the requests based on the prefixes. You can set it to false if you really don't need prefixes for some reason. The default is false but the cloud provider might be setting it to true.

In your case, I see

.tableProperty("write.object-storage.path-style.enabled", "true")\

which might be a provider specific iceberg config as it is not public.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.