0

I am new to Spark, Hadoop & Scala. I have a situation where I need to read from a local directory/file from Scala/Spark, and I am facing issues. I see others have come across the same issue, but I don't see a solution though.

I am using Spark 1.6.2

My code reads like this:

def main(arg: Array[String]): Unit = {

val conf = new SparkConf().setAppName("MyAppName").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

val resultDf = sqlContext.read.json("/opt/app/poc/myfile.json")
}

I am getting the following error: Exception in thread "main" java.io.IOException: No input paths specified in job

Note: My application is installed and running in /opt/app/spark and I am calling it by calling /usr/bin/spark-submit --class com.mycom.TestMyApp /opt/app/spark/App.jar. I cannot move the json file inside the project jar file - the necessity is to read it from local directory.

I am not able to figure out where I am going wrong. Please help.

Here is part of the stacktrace:

>Exception in thread "main" java.io.IOException: No input paths specified in job
>>        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:202)
>>        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>>        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>>        at scala.Option.getOrElse(Option.scala:120)
>>        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>>        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>> ...
3

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.