0

Need to left padding in Array column of pyspark dataframe without using pandasudf.

Input Dataframe:
|lags|
|----|
|[0]|
|[0,1,2]|
|[0,1]|

Output Data frame:
|lags|
|----|
|[0,0,0]|
|[0,1,2]|
|[0,0,1]|

2 Answers 2

2

You can use array_repeat to create zero padding array and concat them.

Use @ARCrow's function to identify the max array size.

max_arr_size = 3

df = (df.withColumn('pad', F.array_repeat(F.lit(0), max_arr_size - F.size('lags')))
      .withColumn('padded', F.concat('pad', 'lags')))
Sign up to request clarification or add additional context in comments.

1 Comment

On top of array repeat, i used flatten to make it as one array.
0

This is how I did it

import pyspark.sql.functions as f

df = spark.createDataFrame([
    ([0],),
    ([0,1,2],),
    ([0,1],),
    (None,)
], ['lags'])

max_size = (df
            .withColumn('array_size', f.size(f.col('lags')))
            .groupBy()
            .agg(f.max(f.col('array_size')).alias('max_size'))
            .collect()[0].max_size
           )
df = (df
      .withColumn('lags', f.when(f.col('lags').isNull(), f.array(*[])).otherwise(f.col('lags'))) #to deal with null values
      .withColumn('pre_zeros', f.sequence(f.lit(0), f.lit(max_size) - f.size(f.col('lags'))))
      .withColumn('zeros', f.expr('transform(slice(pre_zeros, 1, size(pre_zeros) - 1), element -> 0)'))
      .withColumn('final_lags', f.concat(f.col('zeros'), f.col('lags')))
     )

df.show()

And the output is:

+---------+------------+---------+----------+
|     lags|   pre_zeros|    zeros|final_lags|
+---------+------------+---------+----------+
|      [0]|   [0, 1, 2]|   [0, 0]| [0, 0, 0]|
|[0, 1, 2]|         [0]|       []| [0, 1, 2]|
|   [0, 1]|      [0, 1]|      [0]| [0, 0, 1]|
|       []|[0, 1, 2, 3]|[0, 0, 0]| [0, 0, 0]|
+---------+------------+---------+----------+

1 Comment

@Emma's answer is more efficient. I didn't think of array_repeat transformation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.