0

I need some help in processing XML using XPATH in SPARK SQL. We have XML stored in a Database COLUMN. We need to process the XML and store it in a relational table.

Below is the sample SQL to read XML but it does not produce the desired results. I need all the nodes under <b>... should 3 rows.. not sure how to get the results... appreciate if someone can help

SELECT xpath_string('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','/a/b');

1 Answer 1

2

This is the correct behaviour if you look at the documentation: http://spark.apache.org/docs/latest/api/sql/index.html#xpath_string

xpath_string(xml, xpath) returns the text contents of the first node that matches your xpath.

To get all matching values into an array use xpath. Make sure you add the text() function at the end or you'll get a beautiful stack trace.

%sql
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','/a/b/text()');

Returns

["b1","b2","b3"]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.