I have a csv file having 300 columns. Out of these 300 columns, I need only 3 columns. Hence i defines schema for same. But when I am mapping schema to dataframe it shows only 3 columns but incorretly mapping schema with first 3 columns. Its not mapping csv columns names with my schema structfields. Please advise
from pyspark.sql.types import *
dfschema = StructType([
StructField("Call Number",IntegerType(),True),
StructField("Incident Number",IntegerType(),True),
StructField("Entry DtTm",DateType() ,True)
])
df = spark.read.format("csv")\
.option("header","true")\
.schema(dfschema)\
.load("/FileStore/*/*")
df.show(5)
df.printSchema()after your last line?