Spark SQL "No input paths specified in jobs" when create DataFrame based on JSON file

Question

I am a beginner in Spark and I am trying to create a DataFrame based on the content of JSON file using PySpark by following the guide: http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#overview

However, whenever I execute this command (using both relative path or absolute path)

df = sqlContext.read.json("examples/src/main/resources/people.json")

always gives me the error

java.io.IOException: No input paths specified in job

Snapshot of error

What is the cause of these issue or is there any Spark configuration that I have missed out? I am using Spark 1.6.1 and Python 2.7.6.

what are you doing here I mean using local mode or yarn mode ?? — maogautam
– maogautam, Commented Jul 1, 2016 at 17:00

wxweven · Accepted Answer · 2016-11-25 09:10:49Z

3

I run into this problem too, add "file://" or "hdfs://" works for me! Thanks for Jessika's answer!!!

In conclusion, if your json file is in your local file system,use

df = sqlContext.read.json("file:///user/ABC/examples/src/main/resources/people.json")

else, if your json file is in hdfs, use

df = sqlContext.read.json("hdfs://ip:port/user/ABC/examples/src/main/resources/people.json")

answered Nov 25, 2016 at 9:10

wxweven

811 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PasLeChoix Over a year ago

I was using file:// in front of the file location and it prompts: File file:/home/ABC/data/airports-extended.dat does not exist After I removed file:// from the location, it seems to be able to find the file; However, when I do df.show(10), I am still getting error: No input paths specified in job

Jessika · Accepted Answer · 2016-09-30 10:27:33Z

2

try to add file:// at the beginning of your absolute path: df = sqlContext.read.json("file:///user/ABC/examples/src/main/resources/people.json")

answered Sep 30, 2016 at 10:27

Jessika

213 bronze badges

Comments

maogautam · Accepted Answer · 2016-07-01 17:02:17Z

0

If you are running your code on local mode then provide complete path of your file.
Suppose your file location is "/user/ABC/examples/src/main/resources/people.json". Then your code should be like this.

df =sqlContext.read.json("/user/ABC/examples/src/main/resources/people.json")

If you are running your code yarn mode then check your file exist in HDFS and provide complete location

df = sqlContext.read.json("/user/ABC/examples/src/main/resources/people.json")

answered Jul 1, 2016 at 17:02

maogautam

3182 silver badges13 bronze badges

2 Comments

Chia Jiun Wei Over a year ago

I am runnig on yarn-client mode. I have copy people.json to HDFS by doing the following: hadoop fs -mkdir /user/hadoop/dir1 hadoop fs -put people.json /user/hadoop/dir1/people.json But it still show the same error. Did I missed out anything?

maogautam Over a year ago

Then your code should be df = sqlContext.read.json("/user/hadoop/dir1/people.json") and it should work

Thomas Decaux · Accepted Answer · 2017-12-15 08:27:44Z

0

You must specify the file system protocol:

hdfs Hadoop File System (used by default)
file local file system
s3a / s3n AWS S3
swift

But also, the path must exist where the Spark driver and worker(s) are executed.

answered Dec 15, 2017 at 8:27

Thomas Decaux

22.9k3 gold badges129 silver badges140 bronze badges

Collectives™ on Stack Overflow

Spark SQL "No input paths specified in jobs" when create DataFrame based on JSON file

4 Answers 4

1 Comment

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related