Pyspark Cannot resolve column name when Column does exist

Question

I had some Pyspark code that was working with a sample csv BLOB and then I decided to point it to a bigger dataset. This line:

df= df.withColumn("TransactionDate", df["TransactionDate"].cast(TimestampType()))

In now throwing this error:

AnalysisException: u'Cannot resolve column name "TransactionDate" among ("TransactionDate","Country ...

Clearly TransactionDate exists as a column in the dataset so why is it suddenly not working?

Reddspark · Accepted Answer · 2018-12-31 19:35:27Z

1

Ah ok I figured it out. If you get this issue check your delimiter. In my new dataset it was "," where as in my smaller sample is was "|"

df = spark.read.format(file_type).options(header='true', quote='"', delimiter=",",ignoreLeadingWhiteSpace='true',inferSchema='true').load(file_location)

answered Dec 31, 2018 at 19:35

Reddspark

7,64713 gold badges57 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pyspark Cannot resolve column name when Column does exist

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related