Difference Between Spark SQL and Hive

Question

Can you please help me to understand the difference between Spark SQl and Hive?

Lakshman Battini · Accepted Answer · 2017-06-04 07:19:52Z

The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.

Built on top of Apache Hadoop, Hive provides the following features:

Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
Access to files stored either directly in Apache HDFS or in other data storage systems such as Apache HBase
Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.
A mechanism to impose structure on a variety of data formats

Where as, Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing.

Spark SQL is a Spark module for structured data processing, in which in-memory processing is its core. Using Spark SQL, can read the data from any structured sources, like JSON, CSV, parquet, avro, sequencefiles, jdbc , hive etc.

Spark SQL can also be used to read data from an existing Hive installation. Thus, Spark SQL is the generalized module which can be used to process any structured data-source.

Collectives™ on Stack Overflow

Difference Between Spark SQL and Hive

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related