I would like to specify a schema for spark dataframes in python. After I load the data once, I can print the Schema, I might see something like
df = spark.read.json(datapath)
df.schema
StructType(List(StructField(fldname,StringType,true)))
Having created this python object: df.schema by reading the data, I can now use it to read more. However I think I will wait less if I don't have to first read the data to get the schema - I'd like to persist the schema, even just typing in the schema in my script. For typing it in, I've tried
from pyspark.sql.types import StructType, StructField, StringType
schema = StructType([ StructField('fldname', StringType, True)])
but I get the message
AssertionError: dataType should be DataType
This is spark 2.0.2
StringType, useStringType().