Unable to read a file from HDFS using spark shell in ubuntu

Question

I have installed spark and hadoop in standalone modes on ubuntu virtualbox for my learning. I am able to do normal hadoop mapreduce operations on hdfs without using spark. But when I use below code in spark-shell,

val file=sc.textFile("hdfs://localhost:9000/in/file")
scala>file.count()

I get "input path does not exist." error. The core-site.xml has fs.defaultFS with value hdfs://localhost:9000. If I give localhost without the port number, I get "Connection refused" error as it is listening on default port 8020. Hostname and localhost are set to loopback addresses 127.0.0.1 and 127.0.1.1 in etc/hosts. Kindly let me know how to resolve this issue. Thanks in advance!

try this in terminal hadoop fs -ls hdfs://localhost:9000/in/. Is file available? — WoodChopper
– WoodChopper, Commented Jul 29, 2016 at 6:34

Swetha · Accepted Answer · 2016-07-29 14:16:48Z

1

I am able to read and write into the hdfs using

"hdfs://localhost:9000/user/<user-name>/..."

Thank you for your help..

answered Jul 29, 2016 at 14:16

Swetha

114 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wmoco_6725 · Accepted Answer · 2016-07-29 03:58:35Z

0

Probably your configuration is alright, but the file is missing, or in an unexpected location...

1) try:

sc.textFile("hdfs://in/file")
sc.textFile("hdfs:///user/<USERNAME>/in/file")

with USERNAME=hadoop, or your own username

2) try on the command line (outside spark-shell) to access that directory/file :

hdfs dfs -ls /in/file

answered Jul 29, 2016 at 3:58

wmoco_6725

3,2191 gold badge14 silver badges12 bronze badges

Collectives™ on Stack Overflow

Unable to read a file from HDFS using spark shell in ubuntu

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related