1

I am using https://github.com/rcongiu/Hive-JSON-Serde this json serde. I do the query after adding json serde jar to console , it gives me the data back. Same thing I am trying to do with the java code but its not happening.

hive> use oracle_json;
OK
Time taken: 0.858 seconds

hive> add jar json-serde-1.3.6-jar-with-dependencies.jar;

Added json-serde-1.3.6-jar-with-dependencies.jar to class path
Added resource: json-serde-1.3.6-jar-with-dependencies.jar

hive> select * from oracle_trading limit 1;
OK
[{"close_date":"2015-08-09 16:59:37.000000000","instrument_type":"Options","units":95000.0,"created_date":"2011-05-03 16:59:37.000000000","empid":10776,"instrument":"Instrument442","id":442,"open_date":null,"customer_id":870,"indexname":"FTSE","currency":null,"empsal":null}]

I am trying to write a program which will fetch the data from the hive table. The data is in json serde format. I am getting exception while fetching the data from the json serde table. Specially I do not know how to deserialize the data coming from the hive2 server and also don't know how to use this json serde jar through java code. Can you please help me in doing the same.

        package com.db.hive;
        import java.sql.Connection;
        import java.sql.DriverManager;
        import java.sql.ResultSet;
        import java.sql.SQLException;
        import java.sql.Statement;
        import org.openx.data.jsonserde.JsonSerDe;
    /*This jsonSerDe library I have added to POM file BUT do not know how to use
 while executing the executeQuery() method
      */  
        public class HiveTableExample {

            private static String driverName = "org.apache.hive.jdbc.HiveDriver";
            final static String url = "jdbc:hive2://xxxx:10000/oracle_json";
            final static String user_name = "xxxx";
            final static String pwd = "xxxxx";
            private static JsonSerDe de = null;

            public static void main(String[] args) throws SQLException {
                try {
                    Class.forName(driverName);
                } catch (ClassNotFoundException e) {
                    System.exit(1);
                }
                Connection con = DriverManager.getConnection(url, user_name, pwd);
                Statement stmt = con.createStatement();

                String sql = "select * from oracle_trading limit 10";
                System.out.println("Running: " + sql);
                ResultSet res = stmt.executeQuery(sql);              

                while (res.next()) {
                    System.out.println(String.valueOf(res.getString(1)) + "\t" + res.getString(2));
                }
            }
        }

I am getting exception as shown below. ... ...

Running: select * from oracle_trading limit 10
Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found)
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
    at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
    at com.db.hive.HiveTableExample.main(HiveTableExample.java:42)

My POM file

  <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.db.hive</groupId>
    <artifactId>HiveQuery</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <build> 
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>1.7</source>
                    <target>1.7</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <mainClass>com.db.hive.HiveTableExample</mainClass>
                        </manifest>
                    </archive> 
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin> 

        </plugins>
    </build>

    <dependencies>
        <dependency>
            <groupId>org.openx.data</groupId> 
            <artifactId>json-serde</artifactId> 
            <version>1.3.6-SNAPSHOT-jar-with-dependencies</version> 
            <scope>system</scope>
            <systemPath>C:\Users\mahendra.pansare\Documents\NetBeansProjects\HiveQuery\src\main\resources\json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar</systemPath>
        </dependency> 
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>1.1.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.0</version>
        </dependency>



    </dependencies>


</project>
2
  • do you are using maven? If yes, you have to include the dependency in your pom. Can you show your pom. If you dont use maven you have to include the .jar in your libraries folder. Commented Jan 4, 2016 at 8:00
  • 1
    Please have a look on the attached pom file above. Where I pointed the physically downloaded the jar and even able to create a instance inside the HiveTableExample class. Commented Jan 4, 2016 at 8:20

1 Answer 1

2

To answer this, let me first explain how the serde works. SerDe, is a method of adding new functionality to hive, providing an extendable interface where to plug in data formats, like JSON.

As an extension of hive, the code for the serde has to be available to all the nodes in the cluster. When using the hive shell, you do that by either putting the serde into the EXTRA_LIBS directory, or telling your script ADD JAR serde.jar. What the hive shell does for you is actually take the serde and send it to all the nodes every time you run a query.

Now, as for your problem. You're not using the shell, but the JDBC API, which talks to the hiveserver process instead of the hive shell. You don't need to include the serde in your maven project since the JDBC API does not distribute the JAR automatically for you like the hive shell does. What you need to do is to install the serde in the extra libs directory of the hive server you're talking to

So, it's a configuration issue rather than a problem with your code. Unrelated to this issue, but it's good practice to close the connection using a try { ...} finally { ..}

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot Roberto for your help. I will follow the practice :)
Roberto Congiu, I added this jar to /hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib path, some queries worked on it which were depending on the json-serde. BUT for some queries map-reduce job gets called it was not able to detect the json serde jar file, it was throwing exception that its not found. Can you please tell me what is location of mapreduce framework where I need to add this jar?
I have added my jar at /hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce/lib/json-serde-1.3.6-jar-with-dependencies.jar and /hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar of every node and it worked for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.