2

I have a PySpark dataframe with a column that contains comma separated values. The number of values that the column contains is fixed (say 4). Example:

+------------------------+
|col1                    |
+------------------------+
|1,val1, val4            |
|2,val1                  |
|3,val1, val2, val3      |
|4,val1, val2, val3, val4|
+------------------------+

Now I want it to be split into 2 columns like below

+----+------------------------+
|col1|col2                    |
+----+------------------------+
|   1|[val1, val4]            |
|   2|[val1]                  |
|   3|[val1, val2, val3]      |
|   4|[val1, val2, val3, val4]|
+----+------------------------+

How can this be done?

1 Answer 1

3

You can implement this using slice and split:

from pyspark.sql.functions import col, split, slice

array_len = 4
df.withColumn("ar", split(col("col1"), ",")) \
  .select(
     col("ar")[0].alias("col1"), 
     slice(col("ar"), 2, array_len).alias("col2")
  )

# +----+---------------------------+
# |col1|col2                       |
# +----+---------------------------+
# |1   |[val1,  val4]              |
# |2   |[val1]                     |
# |3   |[val1,  val2,  val3]       |
# |4   |[val1,  val2,  val3,  val4]|
# +----+---------------------------+

First we split and store the array into ar, next we use select to retrieve the first item of the array with col("ar")[0] and the rest of the array with slice(col("ar"), 2, array_len) which will return all the items except the first one.

Sign up to request clarification or add additional context in comments.

4 Comments

What does 2 in slice indicates?
we keep the items 2-4, the slice function starts counting from 1
Okay understood but why do you defined array_len manually? Is there any way to specify range like we do in pandas iloc[10:] which means tenth row and onwards.
no, slice just accepts integers @Sid_K (not even Column i.e size(col("ar")))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.