1

I am loading a xml file using com.databricks.spark.xml and i want to read a tag attribute using the sql context .

XML :

<Receipt>
<Sale>
<DepartmentID>PR</DepartmentID>
<Tax TaxExempt="false" TaxRate="10.25"/>
</Sale>
</Receipt>

Loaded the file by,

val df = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Receipt").load("/home/user/sale.xml");
df.registerTempTable("SPtable");

Printing the Schema:

root
 |-- Sale: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- DepartmentID: long (nullable = true)
 |    |    |-- Tax: string (nullable = true)

Now i want to extract the tag attribute TaxExempt from Tax.I tried the following code and it is giving me error .

val tax =sqlContext.sql("select Sale.Tax.TaxExempt from SPtable");

Error:

org.apache.spark.sql.AnalysisException: cannot resolve 'Sale.Tax[TaxExempt]' due to data type mismatch: argument 2 requires integral type, however, 'TaxExempt' is of string type.; line 1 pos 7

Any help is highly Appreciated.

1 Answer 1

5

First print schema of the dataframe, in my case it is printed like below with spark xml version 0.3.3

|-- Sale: struct (nullable = true)
|    |-- DepartmentID: string (nullable = true)
|    |-- Tax: struct (nullable = true)
|    |    |-- #VALUE: string (nullable = true)
|    |    |-- @TaxExempt: boolean (nullable = true)
|    |    |-- @TaxRate: double (nullable = true)

Then use the below query to select xml attributes, after registering the temptable

sqlContext.sql("select Sale.Tax['@TaxRate'] as TaxRate from temptable").show();

Below is the Result

| TaxRate|

+-----+

|10.25|

Starting from 0.4.1, i think the attributes by default starts with underscore(_), in this case just use _ instead of @ while querying attributes.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you . I figured out the version problem and was able to print the schema as you showed here. Your select Sale.Tax['@TaxRate'] helped me solve my problem. Thanks a lot :)
how to fetch the same if it comes under the 'root'?
when reading xml set attribute prefix to some fixed value using option("attributePrefix", "_") and then when selecting you can directly select the root attribute like any other element, for example, select _TaxRate from temptable

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.