0

My main question concerns about performance. Looking at the code below:

query = """ 
    SELECT Name, Id FROM Customers WHERE Id <> 1 ORDER BY Id
"""

df = spark.read.format(jdbc) \
     .option("url", "connectionString") \
     .option("user", user) \
     .option("password", password) \
     .option("numPartitions", 10) \
     .option("partitionColumn", "Id") \
     .option("lowerBound", lowerBound) \
     .option("upperBound", upperBound) \
     .option("dbtable", query) \
     .load()

As far as I understand, this command will be sent to the DB process the query and return the value to spark.

Now considering the code below:

df = spark.read.jdbc(url = mssqlconnection,
                     table = "dbo.Customers",
                     properties = mssql_prop
).select(
     f.col("Id"),
     f.col("Name")
).where("Id = <> 1").orderBy(f.col("Id"))

I know that spark will load the entire table into memory and then execute the filters on the dataframe.

Finally, the last code snippet:

df = spark.read.jdbc(url = mssqlconnection,
                     table = "dbo.Customers",
                     properties = mssql_prop
)
   
final_df = spark_session.sql("""
            SELECT Name, Id FROM Customers WHERE Id <> 1 ORDER BY Id
            """)

I have 3 questions:

  1. Among the 3 codes, which one is the most correct. I always use the second approach, is this correct?
  2. What is the difference between using a spark.sql and using the commands directly in the dataframe according to the second code snipper?
  3. What is the ideal number of lines for me to start using spark? Is it worth using in queries that return less than 1 million rows?
2
  • 1
    Does this answer your question? Spark sql queries vs dataframe functions Commented Jan 24, 2022 at 8:41
  • Question 1 and 2 are answered on this link. Can you give me a tip about question 3? Commented Jan 24, 2022 at 10:26

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.