Spark SQL nested JSON error "no viable alternative at input "

Question

Spark SQL nested JSON error :

{
  "xxxDetails":{  
      "yyyData":{  
         "0":{  
            "additionalData":{  

            },
            "quantity":80000,
            "www":12.6,
            "ddd":5.0,
            "eee":72000,
            "rrr":false
         },
         "130":{  
            "additionalData":{  
               "quantity":1
            },
            "quantity":0,
            "www":1.0,
            "ddd":0.0,
            "eee":0,
            "rrr":false
         },
         "yyy":{  
            "additionalData":{  
               "quantity":1
            },
            "quantity":0,
            "www":1.0,
            "ddd":0.0,
            "eee":0,
            "rrr":false
         }       
      }
   },
   "mmmDto":{  
      "id":0,
      "name":"",
      "data":null
   }
 }

when reading spark.sql("select cast (xxxDetails.yyyData.yyy.additionalData.quantity as Long) as quantity from table") it will work but: spark.sql("select cast (xxxDetails.yyyData.130.additionalData.quantity as Long) as quantity from table") will throw Exception :

org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'cast (xxxDetails.yyyData.130.

When I"m usning datafame API for myDF.select("xxxDetails.yyyData.130.additionalData.quantity") its work . Anyone with decent explanation :)

ollik1 · Accepted Answer · 2019-06-02 10:48:25Z

4

It's because SQL column names are expected to start with a letter or some other characters like _, @ or # but not a digit. Let's consider this simple example:

Seq((1, 2)).toDF("x", "666").createOrReplaceTempView("test")

Calling spark.sql("SELECT x FROM test").show() would output

+---+
|  x|
+---+
|  1|
+---+

but calling spark.sql("SELECT 666 FROM test").show() instead outputs

+---+
|666|
+---+
|666|
+---+

because 666 is interpreted as literal, not a column name. To fix this, the column name needs to be quoted using backticks:

spark.sql("SELECT `666` FROM test").show()

+---+
|666|
+---+
|  2|
+---+

edited Jun 2, 2019 at 10:48

answered Jun 2, 2019 at 7:58

ollik1

4,5601 gold badge12 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Arnon Rodman Over a year ago

Hi @ollik1 I apologize I"m updating my example/question with more details, the error still occurs even when I'm using: spark.sql("select cast (xxxDetails.'130'.yyy.quantity as Long) as quantity. again sorry for the first incomplete example.

Richard Nemeth Over a year ago

This works for me, @ArnonRodman remember it is not single quote but skewed quote, i.e. spark.sql("select cast (xxxDetails.yyyData.`130`.additionalData.quantity as Long) as quantity from table")

ollik1 Over a year ago

Edited the answer to emphasise using the correct quotation characters

Arnon Rodman Over a year ago

Thx @ollik1 and Richard Nemeth it worked, where can I find this in the documentation? and why spark.sql API is different from DataFrame.sql API

ollik1 Over a year ago

Not sure if there is any better documentation than this issues.apache.org/jira/browse/SPARK-3483 which leads to github.com/apache/spark/pull/2804/files . It does not explain though why backtick was chosen instead of double-quote which would be standard SQL. Dataframe API is different as it is explicit from the method signatures that the passed string refers to a column. SQL string, however, needs to be parsed and analyzed according to certain rules. Note that identifiers starting with a number would also fail in Java, Scala and Python

Collectives™ on Stack Overflow

Spark SQL nested JSON error "no viable alternative at input "

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related