I am using the below:
from pyspark.sql.functions import *
from pyspark.sql.types import *
import json
def test(test1,test2):
d = [{'amount': a, 'discount': t} for a, t in zip(test1, test2)]
return d
arrayToMapUDF = udf(test,
ArrayType(
StructType([
StructField('amount', StringType()),
StructField('discount', StringType())
])
)
)
df2 = df.withColumn("jsonarraycolumn", arrayToMapUDF(col("amount"), col("discount")))
df2.show(truncate=False)
But I'm getting this error:
raise ValueError("Unexpected tuple %r with StructType" % obj) ValueError: Unexpected tuple '[' with StructType
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:540) at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
When I do df.printschme() which is displaying fine. I am using spark version 2.4.5
+--------------------+--------------+-------------------------------------------------------------+------------------------------
| Name |eligibility | amount | discount |
+--------------------+--------------+------------------------------------------------------------+|-------------------------------|
| product1| Yes|[100, 1500, 2000, 3000, 3001] |[0.01, 0.02, 0.03, 0.04, 0.05] |
| Product2| Yes|[800, 3001,,,] | [0.01, 0.02,,,] |
+--------------------+--------------+--------------------------------------------------------------------------------------------
df.show(truncate=False)? the dataframe was truncated and it's impossible to see the data completely.d = [[a,t] for a, t in zip(test1, test2)]?d = [[a,t] for a, t in zip(test1, test2)]