7

Spark-shell: which basically opens the scala> prompt. Where query needs to write in below manner

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
// Queries are expressed in HiveQL
sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)

spark-sql: Which seems to connect directly to hive metastore and we can write a query in a similar way to hive. And query over existing data in hive

I want to know the difference between these two.. And does processing any query in spark-sql is same as in spark-shell? I mean can we leverage the performance benefits of spark in spark-sql?

Spark 1.5.2's here.

1
  • Ehm, spark-shell is just a shell. spark-sql on the other hand is a library. Comparing them is like comparing apples with tomatoes. BTW, spark-shell automatically imports different spark libraries and instantiate the sqlContext so you don't need this line val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc). Commented May 1, 2017 at 9:35

1 Answer 1

13

spark-shell gives you a working Spark environment where Scala is the (programming) language.

spark-sql gives you a Spark SQL environment where SQL is the query language.

Note that spark-shell is for any APIs available in Spark while spark-sql is only for Spark SQL API (with Datasets and DataFrames).

They're simply different interfaces for users with different skills (spark-shell for Spark/Scala developers while spark-sql for SQL developers).

spark-sql "hides" the Spark infrastructure behind SQL interface which places it higher in how much engineering skills one should have, but eventually uses all the optimizations available in Spark SQL (and Spark in general).

Performance-wise spark-sql and spark-shell are alike.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks...one more thing which I noticed in sparksql is if I tired to join two 11million record tables, its quickly giving outofmemory error..so is there anyways to control the memory options in sparkSql
Sure. It's a Spark application after all so the common properties apply here --> spark.apache.org/docs/latest/configuration.html
@JacekLaskowski can we connect spark-sql to pyspark/spark-shell and import the tables as dataframes ?
@etl_devs Not sure I understand. Could you ask a separate question on SO if you can't find any similar ones?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.