0

I have a below nested schema:

root
|-- fields: struct (nullable = true)
|    |-- configdata: struct (containsNull = true)
|    |    |-- field: string (nullable = true)
|    |    |-- type: string (nullable = true)
|    |    |-- value: string (nullable = true)
|    |-- configdata:struct (containsNull = true)
|    |    |-- field1: string (nullable = true)
|    |    |-- type1: string (nullable = true)
|    |    |-- value1: string (nullable = true)
|-- id: string (nullable = true)
|-- score: double (nullable = true)
|-- siteId: string (nullable = true)

I have to read both the configdata: property from this json. But when I am trying to do :

newDf= dataframe.select(sf.array(sf.expr("configdata"))

It is failing with the Exception:

Ambiguous reference to fields StructField(configdata)

As you can see the configData is of struct Type and I have to read both the config data from this json. Code developed in Pyspark using spark dataFrame API. Can someone please help?

1 Answer 1

2

There are 2 structs an no array situation evident. That is ambiguous as they are at the same level. Not possible. Give a new name to the second one: 'configdata1', at source.

Sign up to request clarification or add additional context in comments.

2 Comments

Since I am new to pyspark, could you please help me editing the same names at the same level while parsing in pyspark only
U need to edit the source or ask them to supply a different name for the second struct at source.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.