0

I have the below script.

I am a bit stuck with this specific piece:

datex = datetime.datetime.strptime(df1.start_time,'%Y-%m-%d %H:%M:%S')

I can't figure out how to extract the actual value from the start_time field & store it in the datex variable.

Can anyone help me please?

while iters <10:

    time_to_add = iters * 900
    time_to_checkx = time_to_check + datetime.timedelta(seconds=time_to_add)

    iters = iters + 1
    session = 0

    for row in df1.rdd.collect():
        datex = datetime.datetime.strptime(df1.start_time,'%Y-%m-%d %H:%M:%S')
        print(datex)
        filterx = df1.filter(datex < time_to_checkx)
        session = session + filterx.count()
        print('current session value' + str(session))

print(session)
5
  • Any specific reason why you're looping over an RDD? Your for loop can be easily converted to pyspark-sql code which will be more efficient Commented Oct 18, 2019 at 15:08
  • I wasn't sure how to achieve the same without looping :( Commented Oct 18, 2019 at 15:18
  • Let me help you with that. What is iters exactly? Commented Oct 18, 2019 at 15:19
  • So during the day there are 360 blocks of 15 minutes. So iters would he set to 360 and each interation I increment the time by 900 seconds which is 15 minutes. So for each 15 minute block I get the total number of active sessions. Thank you for your help by the way Commented Oct 18, 2019 at 15:27
  • Why don't you use a 15 minute window? Commented Oct 18, 2019 at 15:52

1 Answer 1

1

Check this out. I have converted your for loop in general. If you can get me more info on iters variable or the explanation of how you want it to work:

import pyspark.sql.functions a F

spark_date_format = "YYYY-MM-dd hh:mm:ss"
session = 0
time_to_checkx = time_to_check + datetime.timedelta(seconds=time_to_add)

df1 = df1.withColumn('start_time', F.to_timestamp(F.col(date_column), spark_date_format))
filterx = df1.filter(df1.start_time < time_to_checkx)
session = session + filterx.count()
Sign up to request clarification or add additional context in comments.

2 Comments

this is so useful. Thanks! Do you know of any good websites or books where I can learn more about Spark dataframe operations?
@kikee1222 I just follow through the documentation. You should check out the blogs written by databricks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.