4

Here is the code:

from py4j.protocol import Py4JJavaError
def parse_clf_time(s):
    try:
    #return "{0:04d}-{1:02d}-{2:02d} {3:02d}:{4:02d}:{5:02d}".format(int(s[7:11]),month_map[s[3:6]],int(s[0:2]),int(s[12:14]),int(s[15:17]),int(s[18:20]))
        return "{0:04d}-{1:02d}-{2:02d} {3:02d}:{4:02d}:{5:02d}".format(
            int(s[7:11]),
            month_map[s[3:6]],
            int(s[0:2]),
            int(s[12:14]),
            int(s[15:17]),
            int(s[18:20])
            )
    except Py4JJavaError as e:
        return "2016-08-11 00:00:01".format(
            int(s[7:11]),
            month_map[s[3:6]],
            int(s[0:2]),
            int(s[12:14]),
            int(s[15:17]),
            int(s[18:20])

u_parse_time = udf(parse_clf_time)

final_df = cleaned_df.select('*', u_parse_time(cleaned_df['timestamp']).cast('timestamp').alias('time')).drop('timestamp')
total_log_entries = final_df.count()

The df may contain bad data so I want to use a silly try except to handle it, please let me what is the best practice to exclude bad data.

For unknown reason, I got error:

enter image description here

So what's wrong with the code? It works in another project on the same environment so I am pretty sure the error should not be from the code itself.

Thank you very much, any clue is appreciated.

2 Answers 2

6

You missed a ) for return "2016-08-11 00:00:01".format(

Also, you didn't have

from pyspark.sql.functions import udf
Sign up to request clarification or add additional context in comments.

Comments

1

missing parentheses or bracket are indeed so common, I would suggest you using a text edit tool for double check in case like this. I use UltraEdit which is great to me.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.