I had some Pyspark code that was working with a sample csv BLOB and then I decided to point it to a bigger dataset. This line:
df= df.withColumn("TransactionDate", df["TransactionDate"].cast(TimestampType()))
In now throwing this error:
AnalysisException: u'Cannot resolve column name "TransactionDate" among ("TransactionDate","Country ...
Clearly TransactionDate exists as a column in the dataset so why is it suddenly not working?