1

In a standalone application (runs on java8, Windows 10 with spark-xxx_2.11:2.0.0 as jar dependencies) next code gives an error:

/* this: */
Dataset<Row> logData = spark_session.createDataFrame(Arrays.asList(
    new LabeledPoint(1.0, Vectors.dense(4.9,3,1.4,0.2)),
    new LabeledPoint(1.0, Vectors.dense(4.7,3.2,1.3,0.2))
  ), LabeledPoint.class);

/* or this: */
/* logFile: "C:\files\project\file.csv", "C:\\files\\project\\file.csv",
            "C:/files/project/file.csv", "file:/C:/files/project/file.csv",
            "file:///C:/files/project/file.csv", "/file.csv" */
Dataset<Row> logData = spark_session.read().csv(logFile);

Exception:

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/files/project/spark-warehouse
               at org.apache.hadoop.fs.Path.initialize(Path.java:206)
               at org.apache.hadoop.fs.Path.<init>(Path.java:172)
               at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114)
               at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
               at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
               at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
               at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
               at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112)
               at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
               at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
               at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
               at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
               at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
               at <call in my line of code>

How can I load csv file into Dataset<Row> from java code?

1 Answer 1

1

There is some issue with file system path. See jira https://issues.apache.org/jira/browse/SPARK-15899. For workaround you can set "spark.sql.warehouse.dir" in SparkSession like below.

SparkSession spark = SparkSession
  .builder()
  .appName("JavaALSExample")
  .config("spark.sql.warehouse.dir", "/file:C:/temp")
  .getOrCreate();
Sign up to request clarification or add additional context in comments.

1 Comment

Hard to beleive, but issue's severity is 'Minor'. This workaround solves my error, thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.