I'd like to write an any_lambda function that checks if any of the elements in an ArrayType column meet a condition specified by a lambda function.
Here's the code I have that's not working:
def any_lambda(f, l):
return any(list(map(f, l)))
spark.udf.register("any_lambda", any_lambda)
source_df = spark.createDataFrame(
[
("jose", [1, 2, 3]),
("li", [4, 5, 6]),
("luisa", [10, 11, 12]),
],
StructType([
StructField("name", StringType(), True),
StructField("nums", ArrayType(StringType(), True), True),
])
)
actual_df = source_df.withColumn(
"any_num_greater_than_5",
any_lambda(lambda n: n > 5, col("nums"))
)
This code raises TypeError: Column is not iterable.
How can I create an any_lambda function that works?