java.net.URISyntaxException: Relative path in absolute URI

Question

I need to read a file stored in my project's resources, the directory is src/main/resources/dataset/dataset.dat. I'm using the following lines of Scala code to read a text file from HDFS and parse as Spark RDD of dataset objects:

// init Spark context
val conf: SparkConf = new SparkConf().setAppName("mydataset").setMaster("local")
val sc: SparkContext = new SparkContext(conf)

// read dat file
val resource = this.getClass.getClassLoader.getResource("dataset/dataset.dat")
val dsRdd: RDD[DatasetObject] = sc.textFile(resource.toString(), 1).map(line => DatasetData.parse(line))

but the following error occurred:

class java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/grader/grader.jar!/dataset/dataset.dat
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/grader/grader.jar!/dataset/dataset.dat

I tried to read the file in another way but the error keeps occurring:

val dsRdd: RDD[DatasetObject] = sc.textFile("src/main/resources/dataset/dataset.dat").map(line => DatasetData.parse(line))

Important: Unit tests are successfully run locally, the problem occurs on the remote test environment.

Can you describe your remote test environment? Cloud? Remember that the workers try to load the file, is it available to them? — jgp
– jgp, Commented Dec 14, 2021 at 12:11
@jgp sorry but I haven't details about the remote environment because it is the Coursera online lab used for the assignments. — Lorenzo D'Isidoro
– Lorenzo D'Isidoro, Commented Dec 14, 2021 at 15:53
I think your issue is with the path nevertheless… is it an old course? RDDs are so 2018 :) — jgp
– jgp, Commented Dec 14, 2021 at 23:39
src/main does not exist in your JAR or after the code compiles. There is a class called SparkFiles, I believe, which you should be using here. — OneCricketeer
– OneCricketeer, Commented Dec 15, 2021 at 0:15
@OneCricketeer Thanks for your time, I found a solution. I leave a comment below :) — Lorenzo D'Isidoro
– Lorenzo D'Isidoro, Commented Dec 15, 2021 at 8:36

Lorenzo D'Isidoro · Accepted Answer · 2021-12-15 08:44:03Z

1

The problem was using getResource and textFile, I had to use a combination of getResourceAsStream and sc.parallelize as follow:

def lines: List[String] = {
    Option(getClass.getResourceAsStream("/dataset/dataset.dat")) match {
      case None => sys.error("Please download the dataset as explained in the assignment instructions")
      case Some(resource) => Source.fromInputStream(resource).getLines().toList
    }
  }

and parse as Spark RDD of dataset objects

val dsRdd: RDD[DatasetObject] = sc.parallelize(lines).map(line => DatasetData.parse(line))

answered Dec 15, 2021 at 8:44

Lorenzo D'Isidoro

2,2393 gold badges18 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

OneCricketeer Over a year ago

Depending on the size of the file, it's preferred to send it with spark-submit ... --files rather than as part of the JAR

Collectives™ on Stack Overflow

java.net.URISyntaxException: Relative path in absolute URI

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related