How to split string column into array of characters?
Input:
from pyspark.sql import functions as F
df = spark.createDataFrame([('Vilnius',), ('Riga',), ('Tallinn',), ('New York',)], ['col_cities'])
df.show()
# +----------+
# |col_cities|
# +----------+
# | Vilnius|
# | Riga|
# | Tallinn|
# | New York|
# +----------+
Desired output:
# +----------+------------------------+
# |col_cities|split |
# +----------+------------------------+
# |Vilnius |[V, i, l, n, i, u, s] |
# |Riga |[R, i, g, a] |
# |Tallinn |[T, a, l, l, i, n, n] |
# |New York |[N, e, w, , Y, o, r, k]|
# +----------+------------------------+