1

I have a SQL Server table that has a different schema than my dataframe. I would like to select some columns from my dataframe and "insert into" the table the values I selected.

Basically something similar to the code below but in pyspark:

INSERT INTO Cust_Diff_Schema_tbl
(acct_num, name)
SELECT account_no, name
FROM customers
WHERE customer_id > 5000;

I can read the data using jdbc using spark.read. Just like below:

df_s3 = spark.read.format("jdbc")\
                .option("driver", db_driver_name)\
                .option("url", db_url+ ":1433;databaseName="+stage_db)\
                .option("dbtable", tbl_name)\
                .option("query", """(select * from customers)""")\
                .option("user", db_username)\
                .option("password", db_password)\
                .load()
    
    df_s3.printSchema()
    df_s3.show(20)

To write/append the data to the table with the selected values, I believe I can still use "df_s3.write" but I need an example on how to use the insert statement using ".option" function or another approach if this does not work.

Thanks in advance.

1 Answer 1

0
//create dataframe

val df = //fetch from  db,read file or other options

df.write.format("jdbc")
      .option("numPartitions", 20)
      .option("batchsize", 10000)
      .option("truncate", "true")
      .option("url", "jdbcURL")
      .option("driver", "Driver name")
      .option("dbtable", "tablename")
      .mode("append")
      .save()
Sign up to request clarification or add additional context in comments.

5 Comments

where is my insert into statement with the passed in values from my dataframe??
first you need to create df and whatever column you want to insert keep only that in df ,now whan you save save and mode append will add at the end of table if you say moe override then truncate delete and re create table for you ,inshort no instrt into sql needed
do you have an example you can share?
You do understand my table schema is different than my dataframe which is why I think I have to use insert into, to pick the columns.
Create new dataframe with existing df and keep.only column you want to insert ,will share example with you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.