How to Execute sql queries in Apache Spark

Question

I am very new to Apache Spark.
I have already configured spark 2.0.2 on my local windows machine. I have done with "word count" example with spark.
Now, I have the problem in executing the SQL Queries. I have searched for the same , but not getting proper guidance .

I am getting the above exception while running the below command — rajkumar chilukuri
– rajkumar chilukuri, Commented Nov 29, 2016 at 7:08
val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/mydb").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "mydb").option("user", "root").option("password", "").load() — rajkumar chilukuri
– rajkumar chilukuri, Commented Nov 29, 2016 at 7:08

Shivansh · Accepted Answer · 2016-11-29 13:20:02Z

16

So you need to do these things to get it done ,

In Spark 2.0.2 we have SparkSession which contains SparkContext instance as well as sqlContext instance.

Hence the steps would be :

Step 1: Create SparkSession

val spark = SparkSession.builder().appName("MyApp").master("local[*]").getOrCreate()

Step 2: Load from the database in your case Mysql.

val loadedData=spark
      .read
      .format("jdbc")
      .option("url", "jdbc:mysql://localhost:3306/mydatabase")
      .option("driver", "com.mysql.jdbc.Driver")
      .option("mytable", "mydatabase")
      .option("user", "root")
      .option("password", "toor")
      .load().createOrReplaceTempView("mytable")

Step 3: Now you can run your SqlQuery just like you do in SqlDatabase.

val dataFrame=spark.sql("Select * from mytable")
dataFrame.show()

P.S: It would be better if you use DataFrame Api's or even better if DataSet Api's , but for those you need to go through the documentation.

Link to Documentation: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.sql.Dataset

answered Nov 29, 2016 at 13:20

Shivansh

3,56426 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Oliver W. Over a year ago

Could you add some arguments behind your suggestion to use the Dataset API? Otherwise the statement is just an opinion and should be ignored. Keep in mind that the SQL API is usually ahead (e.g. many higher order functions were introduced in Spark SQL 2.4, but they were not available to the Dataset API).

Sanjeev Dhiman Over a year ago

Assuming this way, the query will be fired from the driver, right? Also, the results will be loaded into the driver's memory, correct?

Glennie Helles Sindholt · Accepted Answer · 2016-11-29 13:06:30Z

9

In Spark 2.x you no longer reference sqlContext, but rather spark, so you need to do:

spark
  .read
  .format("jdbc")
  .option("url", "jdbc:mysql://localhost:3306/mydb")
  .option("driver", "com.mysql.jdbc.Driver")
  .option("dbtable", "mydb")
  .option("user", "root")
  .option("password", "")
  .load()

answered Nov 29, 2016 at 13:06

Glennie Helles Sindholt

13.2k6 gold badges47 silver badges52 bronze badges

2 Comments

Sanjeev Dhiman Over a year ago

Assuming this way, the query will be fired from the driver, right? Also, the results will be loaded into the driver's memory, correct?

Glennie Helles Sindholt Over a year ago

No, not correct. Data will be loaded by workers into worker memory as always. But I believe there are also settings for how many concurrent connections it should use.

GaneshMuni · Accepted Answer · 2022-04-04 06:13:51Z

2

You should have your Spark DataFrame.
Create a TempView out of DataFrame

df.createOrReplaceTempView("dftable")
dfsql = sc.sql("select * from dftable")

You can use long queries in statement format:

sql_statement = """
select sensorid, objecttemp_c,
year(DateTime) as year_value,
month(DateTime) as month_value,
day(DateTime) as day_value,
hour(DateTime) as hour_value
from dftable
order by 1 desc
"""

dfsql = sc.sql(sql_statement)

answered Apr 4, 2022 at 6:13

GaneshMuni

516 bronze badges

Comments

Brian · Accepted Answer · 2021-12-24 04:32:13Z

0

Its rather simple now in spark to do SQL queries. You can do SQL on dataframes as pointed out by others but the questions is really how to do SQL.

spark.sql("SHOW TABLES;") that's it.

answered Dec 24, 2021 at 4:32

Brian

1,0141 gold badge13 silver badges42 bronze badges

6 Comments

Doug_Ivison Over a year ago

With spark in ASA (Azure Synapse Analytics), there is "lazy loading" ... so spark.sql creates the dataframe, but does not yet execute. Execution waits until there's an output action. Does spark.sql immediately execute, in other implementations of spark?

Brian Over a year ago

spark.sql executes SQL. I believe the intention is to use it over transformations in python. If you want to do the transformations in python on dataframes then it's going to use lazy evaluation. But if you use SQL and join a few tables, do some calls, and write to a table that's done in lazy eval.. but it has an action so its executed. SQL is declarative python is not.

Doug_Ivison Over a year ago

Thanks for the reply. I guess you meant that doing it in SQL, it would have eager eval? I got the point in any case.

Brian Over a year ago

SQL will still execute in lazy evaluation. Spark will always. What you won't have is the chance to notice the lazy evaluation since you are including an action. FULL SQL is not procedural. If I write create this table, then it does it.

Doug_Ivison Over a year ago

Ok, thanks. By "full sql", are you including what runs in an azure synapse analytics notebook cell, that starts with %%sql? Because my experience is that those %%sql cells execute immediately -- eager eval -- even if the notebook as a whole, is set to pyspark.

|

Pankaj Shet · Accepted Answer · 2023-03-20 13:01:28Z

0

Executing SQL Queries using spark.sql() or dataset API will compile to exactly same code by the catayst optimiser at compile time and AQE at runtime. You can choose whatever you are comfortable.

answered Mar 20, 2023 at 13:01

Pankaj Shet

1892 silver badges9 bronze badges

Collectives™ on Stack Overflow

How to Execute sql queries in Apache Spark

5 Answers 5

2 Comments

2 Comments

Comments

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related