1

I am a beginner in Spark and I am trying to create a DataFrame based on the content of JSON file using PySpark by following the guide: http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#overview

However, whenever I execute this command (using both relative path or absolute path)

df = sqlContext.read.json("examples/src/main/resources/people.json")

always gives me the error

java.io.IOException: No input paths specified in job

Snapshot of error

What is the cause of these issue or is there any Spark configuration that I have missed out? I am using Spark 1.6.1 and Python 2.7.6.

1
  • what are you doing here I mean using local mode or yarn mode ?? Commented Jul 1, 2016 at 17:00

4 Answers 4

3

I run into this problem too, add "file://" or "hdfs://" works for me! Thanks for Jessika's answer!!!

In conclusion, if your json file is in your local file system,use

df = sqlContext.read.json("file:///user/ABC/examples/src/main/resources/people.json")

else, if your json file is in hdfs, use

df = sqlContext.read.json("hdfs://ip:port/user/ABC/examples/src/main/resources/people.json")
Sign up to request clarification or add additional context in comments.

1 Comment

I was using file:// in front of the file location and it prompts: File file:/home/ABC/data/airports-extended.dat does not exist After I removed file:// from the location, it seems to be able to find the file; However, when I do df.show(10), I am still getting error: No input paths specified in job
2

try to add file:// at the beginning of your absolute path: df = sqlContext.read.json("file:///user/ABC/examples/src/main/resources/people.json")

Comments

0

If you are running your code on local mode then provide complete path of your file.
Suppose your file location is "/user/ABC/examples/src/main/resources/people.json". Then your code should be like this.

df =sqlContext.read.json("/user/ABC/examples/src/main/resources/people.json")

If you are running your code yarn mode then check your file exist in HDFS and provide complete location

df = sqlContext.read.json("/user/ABC/examples/src/main/resources/people.json")

2 Comments

I am runnig on yarn-client mode. I have copy people.json to HDFS by doing the following: hadoop fs -mkdir /user/hadoop/dir1 hadoop fs -put people.json /user/hadoop/dir1/people.json But it still show the same error. Did I missed out anything?
Then your code should be df = sqlContext.read.json("/user/hadoop/dir1/people.json") and it should work
0

You must specify the file system protocol:

  • hdfs Hadoop File System (used by default)
  • file local file system
  • s3a / s3n AWS S3
  • swift

But also, the path must exist where the Spark driver and worker(s) are executed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.