I try to create spark dataframe where I want to convert a list into a column.
Code:
def create_id(n):
return ''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(50))
list_a = [create_id(25) for x in range(100)]
list_b = [create_id(25) for x in range(100)]
df = sc.parallelize([["a", list_a], ["b", list_b]]).toDF()
This results in
_1 _2
0 a [dv2vtdl3sobadlw1svs39emp2n9ogwzzek8b6gvug7xkp...
1 b [kdv6b9ehqx1t8kbxd77ha8435bhduyxp0ilv6e09wpejx..
This will create 100 columns, not 100 rows:
df = sc.parallelize([list_a, list_b]).toDF()
Does anyone know how I can create a DataFrame with a two columns and 100 rows?