Difference between using spark SQL and SQL

My main question concerns about performance. Looking at the code below:

query = """ 
    SELECT Name, Id FROM Customers WHERE Id <> 1 ORDER BY Id
"""

df = spark.read.format(jdbc) \
     .option("url", "connectionString") \
     .option("user", user) \
     .option("password", password) \
     .option("numPartitions", 10) \
     .option("partitionColumn", "Id") \
     .option("lowerBound", lowerBound) \
     .option("upperBound", upperBound) \
     .option("dbtable", query) \
     .load()

As far as I understand, this command will be sent to the DB process the query and return the value to spark.

Now considering the code below:

df = spark.read.jdbc(url = mssqlconnection,
                     table = "dbo.Customers",
                     properties = mssql_prop
).select(
     f.col("Id"),
     f.col("Name")
).where("Id = <> 1").orderBy(f.col("Id"))

I know that spark will load the entire table into memory and then execute the filters on the dataframe.

Finally, the last code snippet:

df = spark.read.jdbc(url = mssqlconnection,
                     table = "dbo.Customers",
                     properties = mssql_prop
)
   
final_df = spark_session.sql("""
            SELECT Name, Id FROM Customers WHERE Id <> 1 ORDER BY Id
            """)

I have 3 questions:

Among the 3 codes, which one is the most correct. I always use the second approach, is this correct?
What is the difference between using a spark.sql and using the commands directly in the dataframe according to the second code snipper?
What is the ideal number of lines for me to start using spark? Is it worth using in queries that return less than 1 million rows?

asked Jan 24, 2022 at 0:38

OdiumPura

6711 gold badge19 silver badges43 bronze badges

1

Does this answer your question? Spark sql queries vs dataframe functions

blackbishop
– blackbishop

2022-01-24 08:41:27 +00:00
Commented Jan 24, 2022 at 8:41
Question 1 and 2 are answered on this link. Can you give me a tip about question 3?

OdiumPura
– OdiumPura

2022-01-24 10:26:20 +00:00
Commented Jan 24, 2022 at 10:26

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Difference between using spark SQL and SQL

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked