0

I am trying to generalize schema for creating empty tables in pyspark. My list holds colname and datatype seperated with space.

Below is my code.

I could generalize col name, but it is not able to cast the type.

from pyspark.sql.types import *
tblColumns = [  'emp_name StringType()'
              , 'confidence DoubleType()'
              , 'addressType StringType()'
              , 'reg StringType()'
              , 'inpindex IntegerType()'
              ]

def createEmptyTable(tblColumns):
  structCols = [StructField(colName.split(' ')[0], (colName.split(' ')[1]), True)
    for colName in tblColumns]
  print('Returning cols', structCols)
  return(structCols)
createEmptyTable(tblColumns)

Gives below error.

AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'>

Is there a way to make datatype as generic

1 Answer 1

1

Yes well, it's throwing an error on you because it's a string. You should cast it somehow by some mapping so for example instead of (colName.split(' ')[1]) you should do some mapping table

from pyspark.sql.types import *
datatype = {
'StringType': StringType
...
}


def createEmptyTable(tblColumns):
  structCols = [StructField(colName.split(' ')[0], datatype[colName.split(' ')[1]](), True)
    for colName in tblColumns]

This way should work, be aware that you will have to declare all the types mapping.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.