0

I'm trying to save the content of a text file in hdfs with Spark:

 import org.apache.spark.{SparkContext, SparkConf}

  object FormatTlfHdfs {   def main(args : Array[String]) {
     val conf = new SparkConf().setAppName("Clean data")
                  .setMaster("local").setSparkHome("/usr/lib/spark")

     val sc = new SparkContext(conf)

     var vertices = sc.textFile("hdfs:///user/cloudera/dstlf.txt").flatMap{ 
       line => line.split("\\s+") }.distinct()

I'm getting the error :

Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no host: hdfs:///user/cloudera/metadata-lookup-tlf

Doing hdfs dfs -ls looks that is correct

cloudera@quickstart grafoTelefonos]$ hdfs dfs -ls /user/cloudera 
Found 6 items 
drwx------   - cloudera cloudera          0 2016-02-04 18:37 /user/cloudera/.Trash 
drwxr-xr-x   - cloudera cloudera          0 2016-05-02 13:38 /user/cloudera/.sparkStaging
-rw-r--r--   1 cloudera cloudera       1294 2016-05-02 13:34 /user/cloudera /dstlf.txt

1 Answer 1

3

Error seems obvious...

Incomplete HDFS URI, no host: hdfs:///user/cloudera/metadata-lookup-tlf

You didn't specify a host machine like

hdfs://quickstart:<hdfs_port>/user/cloudera...

You may not need the <hdfs_port> piece, but it doesn't hurt. I think that the correct port is 8020, so then you'd have

hdfs://quickstart:8020/user/cloudera...
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.