for certain strings - unix_timestamp return null
raw_data.select(F.unix_timestamp(F.lit("2019-03-10T02:56:36Z"),format=date_format)).show(1)
+--------------------------------------------------------------+
|unix_timestamp(2019-03-10T02:56:36Z, yyyy-MM-dd'T'HH:mm:ss'Z')|
+--------------------------------------------------------------+
| null|
+--------------------------------------------------------------+
only showing top 1 row
but for almost the same string - i do get the answer:
+---------------------------------------------------------------------------------------------+
|unix_timestamp(to_utc_timestamp(2019-03-10T02:56:36Z, America/New_York), yyyy-MM-dd HH:mm:ss)|
+---------------------------------------------------------------------------------------------+
| 1552204596|
+---------------------------------------------------------------------------------------------+
only showing top 1 row
when i first convert the problematic string to UTC timestamp - it works....
raw_data.select(F.unix_timestamp(F.to_utc_timestamp(F.lit("2019-03-10T02:56:36Z"), "America/New_York"))).show(1)
+---------------------------------------------------------------------------------------------+
|unix_timestamp(to_utc_timestamp(2019-03-10T02:56:36Z, America/New_York), yyyy-MM-dd HH:mm:ss)|
+---------------------------------------------------------------------------------------------+
| 1552204596|
+---------------------------------------------------------------------------------------------+
only showing top 1 row
is it a problem in converting string with unix_timestamp ? how can i avoid converting to UTC ?