I am new to Spark, Hadoop & Scala. I have a situation where I need to read from a local directory/file from Scala/Spark, and I am facing issues. I see others have come across the same issue, but I don't see a solution though.
I am using Spark 1.6.2
My code reads like this:
def main(arg: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MyAppName").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val resultDf = sqlContext.read.json("/opt/app/poc/myfile.json")
}
I am getting the following error: Exception in thread "main" java.io.IOException: No input paths specified in job
Note: My application is installed and running in /opt/app/spark and I am calling it by calling /usr/bin/spark-submit --class com.mycom.TestMyApp /opt/app/spark/App.jar. I cannot move the json file inside the project jar file - the necessity is to read it from local directory.
I am not able to figure out where I am going wrong. Please help.
Here is part of the stacktrace:
>Exception in thread "main" java.io.IOException: No input paths specified in job
>> at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:202)
>> at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>> at scala.Option.getOrElse(Option.scala:120)
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> ...
file://in starting of url