java.io.FileNotFoundException: File file:/data/home/test/analysis.json does not exist. Spark error while reading json

Question

I am trying to write a json file from a spark/scala program and then read it into a DataFrame. This is my code:

 val analysisWriter = new BufferedWriter(new FileWriter("analysis.json"))
 for(i <- 0 to 10){
         val obj =  arr.get(i).asInstanceOf[JSONObject]
         currentAnalysis(""+obj.get("id"))
    }
    analysisWriter.close()
    val df = hiveContext.read.json("file:///data/home/test/analysis.json")
    df.show(10)

  }   

  def currentAnalysis(id: String): Unit= {
     val arrCurrentAnalysis: JSONObject = acc.getCurrentAnalysis(""+id)

     if(arrCurrentAnalysis != null) {
       analysisWriter.append(arrCurrentAnalysis.toString())
       analysisWriter.newLine()
  }

I get the following error when I try to run this code:

java.io.FileNotFoundException: File file:/data/home/test/analysis.json does not exist

I can see the file being created in the same directory where the jar(I am running the jar using spark-submit) is present. Why is the code not able to find the file?

Initially, I was getting java.io.IOException: No input paths specified in job

As pointed out here : Spark SQL "No input paths specified in jobs" when create DataFrame based on JSON file

and here: Spark java.io.IOException: No input paths specified in job ,

I added file:// to the path to read the json file from and now I get the FileNotFoundException.

I am running spark 1.6 on a yarn cluster. Could it be the case that the file is not being available to the executors as it was created after the program has been launched?

worker_bee · Accepted Answer · 2017-07-20 03:07:27Z

2

From what I understand, your application depends on a local file for some of its business logics.

We can read the file by referring to it as file:///. But for this to work, the copy of the file needs to be on every worker or every worker need to have access to common shared drive as in a NFS mount.

So to solve this you could use spark-submit provides the --files tag to upload files to the execution directories. If you have small files that do not change.

Alternatively as the others have suggested put it in HDFS

answered Jul 20, 2017 at 3:07

worker_bee

4514 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Hemanth Annavarapu · Accepted Answer · 2017-07-20 16:21:49Z

0

So, I guess I am right about the file not being available to all the executors. I was able to solve it by copying the file onto a location in HDFS. I don't see the error anymore. I added the following lines to the code:

val fs = FileSystem.get(new URI("hdfs://nameservice1"), sc.hadoopConfiguration)

fs.copyFromLocalFile(new Path("local_path"), new Path("hdfs_path"))

and then provided the hdfs_path to hiveContext.read.json()

It is able to create the Dataframe without any issues now.

answered Jul 20, 2017 at 16:21

Hemanth Annavarapu

9255 gold badges20 silver badges39 bronze badges

Comments

Carlos Carvalho · Accepted Answer · 2020-10-25 14:37:39Z

-1

We can also get this error message when we have "white spaces" in path file or filenames (i.e. /Folder1/My Images/...).

java.io.FileNotFoundException: File file:/.../314_100.jpg does not exist

My case reading files with spark. Replace "My images" with "My_images" and it should be ok.

answered Oct 25, 2020 at 14:37

Carlos Carvalho

1371 silver badge9 bronze badges

Collectives™ on Stack Overflow

java.io.FileNotFoundException: File file:/data/home/test/analysis.json does not exist. Spark error while reading json

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related