I am very new to pySpark. Appreciate your help.. I have a dataframe
test["1"]={"vars":["x1","x2"]}
test["2"]={"vars":["x2"]}
test["3"]={"vars":["x3"]}
test["4"]={"vars":["x2","x3"]}
pdDF = pd.DataFrame(test).transpose()
sparkDF=spark.createDataFrame(pdDF)
+--------+
| vars|
+--------+
|[x1, x2]|
| [x2]|
| [x3]|
|[x2, x3]|
+--------+
I am looking for a way to group column "vars" by values in the list and count I am looking for next result:
+-----+---+
|count|var|
+-----+---+
| 1| x1|
| 3| x2|
| 2| x3|
+-----+---+
Can somebody advise how to achieve this?
Thanks in advance!