ValueError: Unexpected tuple '[' with StructType

Question

I am using the below:

from pyspark.sql.functions import *
from pyspark.sql.types import *
import json

def test(test1,test2):
    d = [{'amount': a, 'discount': t} for a, t in zip(test1, test2)]
    return d

arrayToMapUDF = udf(test, 
    ArrayType(
        StructType([
            StructField('amount', StringType()), 
            StructField('discount', StringType())
        ])
    )
)

df2 = df.withColumn("jsonarraycolumn", arrayToMapUDF(col("amount"), col("discount")))

df2.show(truncate=False)

But I'm getting this error:

raise ValueError("Unexpected tuple %r with StructType" % obj) ValueError: Unexpected tuple '[' with StructType

at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:540) at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)

When I do df.printschme() which is displaying fine. I am using spark version 2.4.5

+--------------------+--------------+-------------------------------------------------------------+------------------------------
|        Name        |eligibility   |              amount                                         |           discount            |
+--------------------+--------------+------------------------------------------------------------+|-------------------------------|
|           product1|           Yes|[100, 1500, 2000, 3000, 3001]            |[0.01, 0.02, 0.03, 0.04, 0.05] |  
|            Product2|           Yes|[800, 3001,,,]                                            | [0.01, 0.02,,,]              |
+--------------------+--------------+--------------------------------------------------------------------------------------------

could you please use df.show(truncate=False) ? the dataframe was truncated and it's impossible to see the data completely. — mck
– mck, Commented Dec 31, 2020 at 13:11
@mike sorry, I misunderstood your comment. however, per request, I posted in the question section. — john
– john, Commented Dec 31, 2020 at 13:13
Could you try replacing line 2 in the UDF with d = [[a,t] for a, t in zip(test1, test2)]? — mck
– mck, Commented Dec 31, 2020 at 13:15
@mike tried this option as well but getting the same error d = [[a,t] for a, t in zip(test1, test2)] — john
– john, Commented Dec 31, 2020 at 13:23
I am confused. I cannot reproduce this error at all on Spark 2.4.5. — mck
– mck, Commented Dec 31, 2020 at 13:24

blackbishop · Accepted Answer · 2020-12-31 13:31:05Z

3

You don't need UDF for this as you're using Spark 2.4+. You could simply use arrays_zip function like this:

from pyspark.sql.functions import col, arrays_zip
from pyspark.sql.types import StructType, StructField, StringType, ArrayType


data = [
    ("product1", "Yes", [10000, 250000], [0.01, 0.02]),
    ("product2", "Yes", [80000, 300001], [0.01, 0.02])
]

df = spark.createDataFrame(data, ["Name", "eligibility", "amount", "discount"])

schema = ArrayType(
    StructType([
        StructField('amount', StringType()),
        StructField('discount', StringType())
    ])
)
df = df.withColumn("jsonarraycolumn", arrays_zip(col("amount"), col("discount")).cast(schema))

df.show(truncate=False)

+--------+-----------+---------------+------------+-------------------------------+
|Name    |eligibility|amount         |discount    |jsonarraycolumn                |
+--------+-----------+---------------+------------+-------------------------------+
|product1|Yes        |[10000, 250000]|[0.01, 0.02]|[[10000, 0.01], [250000, 0.02]]|
|product2|Yes        |[80000, 300001]|[0.01, 0.02]|[[80000, 0.01], [300001, 0.02]]|
+--------+-----------+---------------+------------+-------------------------------+

edited Dec 31, 2020 at 13:31

answered Dec 31, 2020 at 13:25

blackbishop

32.8k11 gold badges61 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

john Over a year ago

``` your code works perfectly thank you so much for valuable inputs and your efforts ```

Collectives™ on Stack Overflow

ValueError: Unexpected tuple '[' with StructType

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related