Below are configurations:
- Hadoop-2x (1 master, 2 slaves) yarn.nodemanager.resource.memory = 7096 m yarn.scheduler.maximum-allocation= 2560 m
Spark - 1.5.1
spark/conf details in all three nodes :
spark.driver.memory 4g
spark.executor.memory 2g
spark.executor.instances 2spark-sql>CREATE TABLE demo USING org.apache.spark.sql.json OPTIONS path
This path has 32 GB compressed data. It is taking 25 minutes to create table demo. Is there anyway to optimize and bring it down in few minutes? Am I missing something out here?