pyspark extracting specific value to variable

Question

I have the below script.

I am a bit stuck with this specific piece:

datex = datetime.datetime.strptime(df1.start_time,'%Y-%m-%d %H:%M:%S')

I can't figure out how to extract the actual value from the start_time field & store it in the datex variable.

Can anyone help me please?

while iters <10:

    time_to_add = iters * 900
    time_to_checkx = time_to_check + datetime.timedelta(seconds=time_to_add)

    iters = iters + 1
    session = 0

    for row in df1.rdd.collect():
        datex = datetime.datetime.strptime(df1.start_time,'%Y-%m-%d %H:%M:%S')
        print(datex)
        filterx = df1.filter(datex < time_to_checkx)
        session = session + filterx.count()
        print('current session value' + str(session))

print(session)

Any specific reason why you're looping over an RDD? Your for loop can be easily converted to pyspark-sql code which will be more efficient — pissall
– pissall, Commented Oct 18, 2019 at 15:08
So during the day there are 360 blocks of 15 minutes. So iters would he set to 360 and each interation I increment the time by 900 seconds which is 15 minutes. So for each 15 minute block I get the total number of active sessions. Thank you for your help by the way — kikee1222
– kikee1222, Commented Oct 18, 2019 at 15:27

pissall · Accepted Answer · 2019-10-18 15:27:32Z

1

Check this out. I have converted your for loop in general. If you can get me more info on iters variable or the explanation of how you want it to work:

import pyspark.sql.functions a F

spark_date_format = "YYYY-MM-dd hh:mm:ss"
session = 0
time_to_checkx = time_to_check + datetime.timedelta(seconds=time_to_add)

df1 = df1.withColumn('start_time', F.to_timestamp(F.col(date_column), spark_date_format))
filterx = df1.filter(df1.start_time < time_to_checkx)
session = session + filterx.count()

answered Oct 18, 2019 at 15:27

pissall

7,4442 gold badges29 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kikee1222 Over a year ago

this is so useful. Thanks! Do you know of any good websites or books where I can learn more about Spark dataframe operations?

pissall Over a year ago

@kikee1222 I just follow through the documentation. You should check out the blogs written by databricks.

Collectives™ on Stack Overflow

pyspark extracting specific value to variable

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related