I am trying to write a json file from a spark/scala program and then read it into a DataFrame. This is my code:
val analysisWriter = new BufferedWriter(new FileWriter("analysis.json"))
for(i <- 0 to 10){
val obj = arr.get(i).asInstanceOf[JSONObject]
currentAnalysis(""+obj.get("id"))
}
analysisWriter.close()
val df = hiveContext.read.json("file:///data/home/test/analysis.json")
df.show(10)
}
def currentAnalysis(id: String): Unit= {
val arrCurrentAnalysis: JSONObject = acc.getCurrentAnalysis(""+id)
if(arrCurrentAnalysis != null) {
analysisWriter.append(arrCurrentAnalysis.toString())
analysisWriter.newLine()
}
I get the following error when I try to run this code:
java.io.FileNotFoundException: File file:/data/home/test/analysis.json does not exist
I can see the file being created in the same directory where the jar(I am running the jar using spark-submit) is present. Why is the code not able to find the file?
Initially, I was getting java.io.IOException: No input paths specified in job
As pointed out here : Spark SQL "No input paths specified in jobs" when create DataFrame based on JSON file
and here: Spark java.io.IOException: No input paths specified in job ,
I added file:// to the path to read the json file from and now I get the FileNotFoundException.
I am running spark 1.6 on a yarn cluster. Could it be the case that the file is not being available to the executors as it was created after the program has been launched?