Similar to this question (Scala), but I need combinations in PySpark (pair combinations of array column).
Example input:
df = spark.createDataFrame(
[([0, 1],),
([2, 3, 4],),
([5, 6, 7, 8],)],
['array_col'])
Expected output:
+------------+------------------------------------------------+
|array_col |out |
+------------+------------------------------------------------+
|[0, 1] |[[0, 1]] |
|[2, 3, 4] |[[2, 3], [2, 4], [3, 4]] |
|[5, 6, 7, 8]|[[5, 6], [5, 7], [5, 8], [6, 7], [6, 8], [7, 8]]|
+------------+------------------------------------------------+