4

I created a dataframe using sqlContext and I have a problem with the datetime format as it is identified as string.

df2 = sqlContext.createDataFrame(i[1])
df2.show
df2.printSchema()

Result:

2016-07-05T17:42:55.238544+0900
2016-07-05T17:17:38.842567+0900
2016-06-16T19:54:09.546626+0900
2016-07-05T17:27:29.227750+0900
2016-07-05T18:44:12.319332+0900

string (nullable = true)

Since the datetime schema is a string, I want to change it to datetime format as follows:

df3 =  df2.withColumn('_1', df2['_1'].cast(datetime()))

Here I got an error: TypeError: Required argument 'year' (pos 1) not found

What should I do to solve this problem?

1 Answer 1

4

Try this:

from pyspark.sql.types import DateType
ndf = df2.withColumn('_1', df2['_1'].cast(DateType()))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.