I am very new to Apache Spark.
I have already configured spark 2.0.2 on my local windows machine.
I have done with "word count" example with spark.
Now, I have the problem in executing the SQL Queries.
I have searched for the same , but not getting proper guidance .
-
So, what's your problem? You're getting some error?T. Gawęda– T. Gawęda2016-11-28 10:32:55 +00:00Commented Nov 28, 2016 at 10:32
-
error: not found: value sqlContextrajkumar chilukuri– rajkumar chilukuri2016-11-29 07:07:36 +00:00Commented Nov 29, 2016 at 7:07
-
I am getting the above exception while running the below commandrajkumar chilukuri– rajkumar chilukuri2016-11-29 07:08:06 +00:00Commented Nov 29, 2016 at 7:08
-
val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/mydb").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "mydb").option("user", "root").option("password", "").load()rajkumar chilukuri– rajkumar chilukuri2016-11-29 07:08:31 +00:00Commented Nov 29, 2016 at 7:08
-
1not sure why it's down voted. I find this question helpful!TriCore– TriCore2017-11-05 20:11:04 +00:00Commented Nov 5, 2017 at 20:11
5 Answers
So you need to do these things to get it done ,
In Spark 2.0.2 we have SparkSession which contains SparkContext instance as well as sqlContext instance.
Hence the steps would be :
Step 1: Create SparkSession
val spark = SparkSession.builder().appName("MyApp").master("local[*]").getOrCreate()
Step 2: Load from the database in your case Mysql.
val loadedData=spark
.read
.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/mydatabase")
.option("driver", "com.mysql.jdbc.Driver")
.option("mytable", "mydatabase")
.option("user", "root")
.option("password", "toor")
.load().createOrReplaceTempView("mytable")
Step 3: Now you can run your SqlQuery just like you do in SqlDatabase.
val dataFrame=spark.sql("Select * from mytable")
dataFrame.show()
P.S: It would be better if you use DataFrame Api's or even better if DataSet Api's , but for those you need to go through the documentation.
Link to Documentation: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.sql.Dataset
2 Comments
In Spark 2.x you no longer reference sqlContext, but rather spark, so you need to do:
spark
.read
.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/mydb")
.option("driver", "com.mysql.jdbc.Driver")
.option("dbtable", "mydb")
.option("user", "root")
.option("password", "")
.load()
2 Comments
You should have your Spark DataFrame.
Create a TempView out of DataFrame
df.createOrReplaceTempView("dftable")
dfsql = sc.sql("select * from dftable")
You can use long queries in statement format:
sql_statement = """
select sensorid, objecttemp_c,
year(DateTime) as year_value,
month(DateTime) as month_value,
day(DateTime) as day_value,
hour(DateTime) as hour_value
from dftable
order by 1 desc
"""
dfsql = sc.sql(sql_statement)
Comments
Its rather simple now in spark to do SQL queries. You can do SQL on dataframes as pointed out by others but the questions is really how to do SQL.
spark.sql("SHOW TABLES;")
that's it.
6 Comments
spark.sql creates the dataframe, but does not yet execute. Execution waits until there's an output action. Does spark.sql immediately execute, in other implementations of spark?%%sql? Because my experience is that those %%sql cells execute immediately -- eager eval -- even if the notebook as a whole, is set to pyspark.