SOLUTION 1
If you simply want to create a column with a variable number of values, you can use ArrayType of StructType. In your case, you defined an empty StructType, hence the result you get.
You can define a dataframe like this:
df1 = spark.createDataFrame([ (1, [('name1', 'val1'), ('name2', 'val2')]),
(2, [('name3', 'val3')])],
['Id', 'Variable_Column'])
df1.show(truncate=False)
which corresponds to the example you provide:
+---+----------------------------+
|Id |Variable_Column |
+---+----------------------------+
|1 |[[name1,val1], [name2,val2]]|
|2 |[[name3,val3]] |
+---+----------------------------+
Note that you don't need to explicitly define the schema in that case but if you want to, it would look like this (you can call df1.schema to print it by the way):
schema = StructType([
StructField('Id',LongType()),
StructField('Variable_Column',ArrayType(StructType([
StructField('name',StringType()),
StructField('value',StringType())
])))
])
SOLUTION 2
Very similarly, you could use the MapType type like this:
df2 = spark.createDataFrame([ (1, dict([('name1', 'val1'), ('name2', 'val2')])),
(2, dict([('name3', 'val3')]) )
], ['Id', 'Variable_Column'])
df2.show(truncate=False)
+---+---------------------------------+
|Id |Variable_Column |
+---+---------------------------------+
|1 |Map(name2 -> val2, name1 -> val1)|
|2 |Map(name3 -> val3) |
+---+---------------------------------+
SOLUTION 3
In a comment, you say that you would also want variable types. That's not possible with dataframes. If that's truly what you want, you may not be using the right tool. But if it is just a corner case, you could keep record of the type of the data in a string like this:
df3 = spark.createDataFrame([ (1, [('name1', 'val1', 'string'),
('name2', '0.6', 'double')]),
(2, [('name3', '3', 'integer')])],
['Id', 'Variable_Column'])
df3.show(truncate=False)
+---+-----------------------------------------+
|Id |Variable_Column |
+---+-----------------------------------------+
|1 |[[name1,val1,string], [name2,0.6,double]]|
|2 |[[name3,3,integer]] |
+---+-----------------------------------------+
DataTypeand all the values in that column must have this type. Variable length is achievable with arrays or maps. But that's all you can do as far as I know. There are workarounds, but nothing in plain pyspark. Let me try to provide a solution.