6

I am trying to do a simple Spark SQL programming in Java. In the program, I am getting data from a Cassandra table, converting the RDD into a Dataset and displaying the data. When I run the spark-submit command, I am getting the error: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging.

My program is:

SparkConf sparkConf = new SparkConf().setAppName("DataFrameTest")
        .set("spark.cassandra.connection.host", "abc")
        .set("spark.cassandra.auth.username", "def")
        .set("spark.cassandra.auth.password", "ghi");
SparkContext sparkContext = new SparkContext(sparkConf);
JavaRDD<EventLog> logsRDD = javaFunctions(sparkContext).cassandraTable("test", "log",
        mapRowTo(Log.class));
SparkSession sparkSession = SparkSession.builder().appName("Java Spark SQL").getOrCreate();
Dataset<Row> logsDF = sparkSession.createDataFrame(logsRDD, Log.class);
logsDF.show();

My POM dependencies are:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector_2.11</artifactId>
        <version>1.6.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>   
</dependencies>

My spark-submit command is: /home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar

How do I solve this error? Downgrading to 1.5.2 does not work as 1.5.2 does not have org.apache.spark.sql.Dataset and org.apache.spark.sql.SparkSession.

7
  • 1
    @T.Gawęda The solution there does not work for me because downgrading to 1.5.2 as 1.5.2 does not have org.apache.spark.sql.Dataset and org.apache.spark.sql.SparkSession. Commented Dec 6, 2016 at 12:44
  • Please check connector version 2.0 - see github.com/datastax/spark-cassandra-connector Commented Dec 6, 2016 at 13:16
  • @T.Gawęda Connector 2.0 is still in beta. I used it and I get this error: NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at org.apache.spark.sql.SparkSession.getSchema(SparkSession.scala:673) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:340) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:359) at com.jtv.spark.dataframes.App.main(App.java:25) Commented Dec 6, 2016 at 13:26
  • But Connector 1.6 does not support Spark 2.x. This error means you've got wrong guava version. Run mvn dependency:tree and find where you've got conflicts Commented Dec 6, 2016 at 13:27
  • @T.Gawęda I get this error when I use Connector 2.0.0-M3 not 1.6. I had used Connector 1.6 with Spark 2.0 in other programs. The problem starts when I use Spark SQL packages. Commented Dec 6, 2016 at 13:30

5 Answers 5

1

This may be a problem into your IDE. As some of this packages are created and Scala the Java project, sometimes the IDE is unable to understand what is going on. I am using the Intellij and it keeps displaying this message to me. But, when I try to run the "mvn test" or "mvn package" everything is fine. Please check if this is really some package error or just the IDE that is lost.

Sign up to request clarification or add additional context in comments.

Comments

0

Spark Logging is available for Spark version 1.5.2 and lower but not higher version. So your dependency in pom.xml should be like this:

<dependencies>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.2</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.10</artifactId>
    <version>1.5.2</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.10</artifactId>
    <version>1.5.2</version>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.5.2</version>
  </dependency>   
</dependencies>

Please let me know if it works or not.

7 Comments

Tried it. Didn't work. 1.5.2 does not have org.apache.spark.sql.Dataset and org.apache.spark.sql.SparkSession.
Then for them you can use updated version and for other the older version. Try it and let me know.
@Khateeb Hi did you tried the solution what error its showing now.
Getting error: [24,57] cannot access org.apache.spark.internal.Logging
@Khateeb I think their is some problem in spark configuration. Please read stackoverflow.com/questions/34108613/…
|
0

The below dependency worked fine for my case.

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.2.0</version>
    <scope>provided</scope>
</dependency>

Comments

0

Pretty late to the party here, but I added

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.11</artifactId>
  <version>2.1.1</version>
  <scope>provided</scope>
</dependency>

To solve this issue. Seems to work for my case.

Comments

0

Make sure you have the correct spark version in the pom.xml.

previously, in local, I have a different version of Spark and that is why I was getting the error in IntelliJ IDE. "Can not have access Spark.logging class"

In my case, Changed it from 2.4.2 -> 2.4.3, and it solved.

Spark version & Scala version info, we can get from spark-shell command.

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.4.3</version>
</dependency>
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.4.3</version>
</dependency>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.