I'm trying to convert an int ID, and an array of 3 ints into a dataframe with 2 columns to then union with another dataframe in pyspark;
However I'm just getting error after error related to schema and nothing seems to work. I'm not sure why this is.
emp_rdd = spark.sparkContext.emptyRDD()
schema = StructType([
StructField("id", IntegerType(), True),
StructField("data", ArrayType(IntegerType()), True),])
df = spark.createDataFrame(data=emp_rdd, schema=schema)
columns = ['id','data']
for i in range(10):
data = [id, data1]
newRows = spark.createDataFrame(data,columns)
df= df.union(newRows)
This is giving me this error;
Can not infer schema for type: <class 'int'>
Any help would be appreciated