0

I'm working on a windowing function to look at 24 hour time periods and calculating a min/max of a column in that time period and finding the largest difference for any 24 hour period. My timestamp is in the form of: MM/dd/yyyy HH:mm a. When trying to convert this to unix time, only a handful of values are being converted properly. You can notice in the Current Output, some 24 hour start/end times are incorrect:

I.e 01/01/2000 1:53PM as the start is saying 24 hours from then is 01/02/2000 01:53:AM. When checking the unixtime 946691580 to a date converter it comes up as 01/01/2000 1:53AM and not PM. So my issue lies somewhere in converting my Date field to unix_time.

Once my data is formatted, I plan to create a view on top of the data frame and using spark sql to calculate the maximum difference between the two columns.

Any suggestions on what I'm doing wrong?

Sample Input (Where I think my unix time is incorrect):

+------------+------------------+---------+-------+-------+
|TemperatureF|              Date|timestamp|MinTemp|MaxTemp|
+------------+------------------+---------+-------+-------+
|        35.1|01/01/2000 1:53 AM|946691580|   28.0|   36.0|
|        34.0|01/01/2000 1:53 PM|946691580|   28.0|   36.0|
|        35.1|01/01/2000 2:53 AM|946695180|   28.0|   36.0|
|        33.1|01/01/2000 2:53 PM|946695180|   28.0|   36.0|
|        34.0|01/01/2000 3:53 AM|946698780|   28.0|   36.0|
|        32.0|01/01/2000 3:53 PM|946698780|   28.0|   36.0|
|        32.0|01/01/2000 4:53 AM|946702380|   28.0|   37.4|
|        32.0|01/01/2000 4:53 PM|946702380|   28.0|   37.4|
|        30.9|01/01/2000 5:53 AM|946705980|   28.0|   37.4|
+------------+------------------+---------+-------+-------+

Current Output

+-------------------+---------+-------------------+-------+-------+
|              Start|timestamp|                end|MinTemp|MaxTemp|
+-------------------+---------+-------------------+-------+-------+
| 01/01/2000 1:53 AM|946691580|01/02/2000 01:53 AM|   28.0|   36.0|
| 01/01/2000 1:53 PM|946691580|01/02/2000 01:53 AM|   28.0|   36.0|
| 01/01/2000 2:53 AM|946695180|01/02/2000 02:53 AM|   28.0|   36.0|
| 01/01/2000 2:53 PM|946695180|01/02/2000 02:53 AM|   28.0|   36.0|
| 01/01/2000 3:53 AM|946698780|01/02/2000 03:53 AM|   28.0|   36.0|
| 01/01/2000 3:53 PM|946698780|01/02/2000 03:53 AM|   28.0|   36.0|
| 01/01/2000 4:53 AM|946702380|01/02/2000 04:53 AM|   28.0|   37.4|
| 01/01/2000 4:53 PM|946702380|01/02/2000 04:53 AM|   28.0|   37.4|
| 01/01/2000 5:53 AM|946705980|01/02/2000 05:53 AM|   28.0|   37.4|
| 01/01/2000 5:53 PM|946705980|01/02/2000 05:53 AM|   28.0|   37.4|
| 01/01/2000 6:37 PM|946708620|01/02/2000 06:37 AM|   28.0|   37.4|
| 01/01/2000 6:53 AM|946709580|01/02/2000 06:53 AM|   28.0|   37.4|
+-------------------+---------+-------------------+-------+-------+

Current Code:

val data = osh.select(col("TemperatureF"), concat(format_string("%02d",col("Month")),lit("/"),format_string("%02d",col("Day")),lit("/"),col("Year"),lit(" "),col("TimeCST")).as("Date")).filter(col("TemperatureF") > -9999)

val oshdata = data.withColumn("timestamp",unix_timestamp(to_timestamp(col("Date"),"MM/dd/yyyy HH:mm")))

import org.apache.spark.sql.expressions._
val myWindow = Window.orderBy("timestamp").rangeBetween(Window.currentRow, 86400)
val myData = oshdata.withColumn("MinTemp", min(col("TemperatureF")).over(myWindow))
  .withColumn("MaxTemp",max(col("TemperatureF")).over(myWindow))

  myData.show()

myData.createOrReplaceTempView("oshView")


spark.sqlContext.sql("Select Date as Start,timestamp, from_unixtime(timestamp+86400,'MM/dd/yyyy HH:mm a') as end,MinTemp,MaxTemp from oshView").show(25)

Thanks.

3 Answers 3

3

You are parsing the Date column using "MM/dd/yyyy HH:mm", so it ignores the am/pm marker. Instead, you should be using MM/dd/yyyy hh:mm a. Note it is hh for 12-hour time values and not HH.

Sign up to request clarification or add additional context in comments.

Comments

1

Figured it out. Need to update MM/dd/yyyy HH:mm a to be a MM/dd/yyyy hh:mm a.

Output now:

+-------------------+---------+-------------------+-------+-------+
|              Start|timestamp|                end|MinTemp|MaxTemp|
+-------------------+---------+-------------------+-------+-------+
|01/01/2000 12:53 AM|946687980|01/02/2000 12:53 AM|   28.0|   36.0|
| 01/01/2000 1:53 AM|946691580|01/02/2000 01:53 AM|   28.0|   36.0|
| 01/01/2000 2:53 AM|946695180|01/02/2000 02:53 AM|   28.0|   36.0|
| 01/01/2000 3:53 AM|946698780|01/02/2000 03:53 AM|   28.0|   36.0|
| 01/01/2000 4:53 AM|946702380|01/02/2000 04:53 AM|   28.0|   37.4|
| 01/01/2000 5:53 AM|946705980|01/02/2000 05:53 AM|   28.0|   37.4|
| 01/01/2000 6:53 AM|946709580|01/02/2000 06:53 AM|   28.0|   37.4|
| 01/01/2000 7:53 AM|946713180|01/02/2000 07:53 AM|   28.0|   37.4|
| 01/01/2000 8:53 AM|946716780|01/02/2000 08:53 AM|   28.0|   37.4|
+-------------------+---------+-------------------+-------+-------+

Comments

0

This probably boils down to your time parsing being incorrect:

unix_timestamp(to_timestamp($"Date", "MM/dd/yyyy hh:mm a"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.