1

Suppose we want to track the hops made by an package from warehouse to the customer. We have a table which store the data but the data is in a column SAY Route The package starts at the Warehouse – YYY,TTT,MMM The hops end when the package is delivered to the CUSTOMER The values in the Route column are separated by space

ID  Route   
1   TTT A B X Y Z CUSTOMER
2   YYY E Y F G I P B X Q CUSTOMER
3   MMM R T K L CUSTOMER

Expected Output

ID START    END
1   TTT     A
1   A       B
1   B       X
.
.
.
1   Z       CUSTOMER
2   YYY     E
2   E       Y
2   Y       F
.
.
2   Q       CUSTOMER
3   MMM     R
.
.
3   L       CUSTOMER

Is there anyway to achieve this in pyspark

1
  • what's your spark version? Commented Dec 17, 2020 at 13:43

1 Answer 1

1

Add an index to the split route using posexplode, and get the location at the next index for each starting location using lead. If you want to remove the index simply add .drop('index') at the end.

import pyspark.sql.functions as F
from pyspark.sql.window import Window

df2 = df.select(
    'ID',
    F.posexplode(F.split('Route', ' ')).alias('index', 'start')
).withColumn(
    'end', 
    F.lead('start').over(Window.partitionBy('ID').orderBy('index'))
).orderBy('ID', 'index').dropna()

df2.show(99,0)
+---+-----+-----+--------+
|ID |index|start|end     |
+---+-----+-----+--------+
|1  |0    |TTT  |A       |
|1  |1    |A    |B       |
|1  |2    |B    |X       |
|1  |3    |X    |Y       |
|1  |4    |Y    |Z       |
|1  |5    |Z    |CUSTOMER|
|2  |0    |YYY  |E       |
|2  |1    |E    |Y       |
|2  |2    |Y    |F       |
|2  |3    |F    |G       |
|2  |4    |G    |I       |
|2  |5    |I    |P       |
|2  |6    |P    |B       |
|2  |7    |B    |X       |
|2  |8    |X    |Q       |
|2  |9    |Q    |CUSTOMER|
|3  |0    |MMM  |R       |
|3  |1    |R    |T       |
|3  |2    |T    |K       |
|3  |3    |K    |L       |
|3  |4    |L    |CUSTOMER|
+---+-----+-----+--------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.