1

i am trying to read JSON file using Spark SQL in Java. this is my code

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
...
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(jsc);
DataFrame df = sqlContext.jsonFile("~/test.json");
df.printSchema();
df.registerTempTable("test");
...

i made simple JSON "test.json", to make it simple:

{
   "name": "myname"
}

and when i tried to run the code, it comes error message:

efg
17/03/30 10:02:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.6.86.82:36824 with 1948.2 MB RAM, BlockManagerId(driver, 10.6.86.82, 36824)
17/03/30 10:02:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.6.86.82, 36824)
17/03/30 10:02:26 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.apache.spark.sql.sources.CaseInsensitiveMap.<init>(ddl.scala:344)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:572)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:553)
at sugi.kau.sparkonjava.SparkSQL.main(SparkSQL.java:32)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
17/03/30 10:02:26 INFO SparkContext: Invoking stop() from shutdown hook
...

thanks

1 Answer 1

1

in the docs spark for the function jsonFile(String path): Loads a JSON file (one object per line), returning the result as a DataFrame. (Note tha jsonFile is replaced by read().json())

so you should have an object per line and your source file should be like this :

  {"name": "myname"}
  {"name": "myname2"}
  .....
Sign up to request clarification or add additional context in comments.

3 Comments

i changed my json file as what you said but, the error message is still remain i am using spark 2.11 there is no .read() in SQLContext, only .jsonFile
you mean spark 2.1.0 ? in this version you have the method read spark.apache.org/docs/latest/…
thanks for your response. i got the point. i was using old way to get spark connection with sparkconf, javasparkcontext, sqlcontext. with the new one sparkSession method read is available. now, reading JSON file is working fine

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.