Suppose we want to track the hops made by an package from warehouse to the customer. We have a table which store the data but the data is in a column SAY Route The package starts at the Warehouse – YYY,TTT,MMM The hops end when the package is delivered to the CUSTOMER The values in the Route column are separated by space
ID Route
1 TTT A B X Y Z CUSTOMER
2 YYY E Y F G I P B X Q CUSTOMER
3 MMM R T K L CUSTOMER
Expected Output
ID START END
1 TTT A
1 A B
1 B X
.
.
.
1 Z CUSTOMER
2 YYY E
2 E Y
2 Y F
.
.
2 Q CUSTOMER
3 MMM R
.
.
3 L CUSTOMER
Is there anyway to achieve this in pyspark