I have a dataset looking like this: item_nbr | date 123 | 2016-09-23 123 | 2016-10-23 112 | 2016-08-15 112 | 2016-09-15
I use groupByKey to make it look like this: '123',['2016-09-23','2016-10-23'] '112',['2016-08-15','2016-09-15'] Now I want to calculate the difference between these two dates. I have a function that looks like this:
def ipi_generate(x):
member_ipi_list = []
master_ans = []
for j in range(1,len(x[1])):
ans = x[1][j]-x[1][j-1]
master_ans.append(ans)
member_ipi_list.append(x[0])
member_ipi_list.append(master_ans)
return [member_ipi_list]
Which treats the date as if it's string. How do I convert my string date into a int date in pyspark? Thanks.
datetime.strptime(x[1][j], '%Y-%m-%d')