0

I have a simple PySpark code on Databricks which reads the data from a bucket, few minor transformations and writes to delta table.

Currently following below steps. I have hardcoded catalog/schema/table names for now. I need to dynamically replace catalog/schema/table based on parameters. With PySpark, we can easily replace values, is there anyway we can replace values for <catalog_name>.<schema_name>.<table_name> used in final SQL insert, based on input parameters?

  1. Read input data and create dataframe
  2. basic transformation
  3. create temp table from dataframe using createOrReplaceTempView()
  4. write to delta table as below
df = spark.read.option("header", "true") .schema(schema_name).csv(source_path)

df.createOrReplaceTempView(temp_table)

# how can we replace the names based on input parameter/variable
insert into <"name_of_catalog">.<"name_of_schema">.<"name_of_table">
select * from temp_table;
7
  • other than replacing in python, then running resultant string as spark.sql(""), do we any other options? Commented Jan 11, 2024 at 5:21
  • 1
    You try creating the widgets again and use them directly in the spark-sql ? Refer to this link Commented Jan 11, 2024 at 7:29
  • 1
    This works for me - CREATE WIDGET TEXT database_name DEFAULT "default"; SHOW TABLES IN ${database_name} - you can try something similar I guess... Commented Jan 11, 2024 at 7:40
  • if we need to using spark without Databricks specific custom features, is there any way we could achieve this? Commented Jan 12, 2024 at 14:51
  • 1
    mmm, Would you mind looking at this answer and see if it helps ? I have not tried it, but seems very reliable to work... Commented Jan 13, 2024 at 12:22

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.