I am trying to generalize schema for creating empty tables in pyspark. My list holds colname and datatype seperated with space.
Below is my code.
I could generalize col name, but it is not able to cast the type.
from pyspark.sql.types import *
tblColumns = [ 'emp_name StringType()'
, 'confidence DoubleType()'
, 'addressType StringType()'
, 'reg StringType()'
, 'inpindex IntegerType()'
]
def createEmptyTable(tblColumns):
structCols = [StructField(colName.split(' ')[0], (colName.split(' ')[1]), True)
for colName in tblColumns]
print('Returning cols', structCols)
return(structCols)
createEmptyTable(tblColumns)
Gives below error.
AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'>
Is there a way to make datatype as generic