Apache pyspark How to create a column with array containing n elements

Question

I have a dataframe with 1 column of type integer.

I want to create a new column with an array containing n elements (n being the # from the first column)

For example:

x = spark.createDataFrame([(1,), (2,),],StructType([ StructField("myInt", IntegerType(), True)])) 

+-----+
|myInt|
+-----+
|    1|
|    2|
|    3|
+-----+

I need the resulting data frame to look like this:

+-----+---------+
|myInt|    myArr|
+-----+---------+
|    1|      [1]|
|    2|   [2, 2]|
|    3|[3, 3, 3]|
+-----+---------+

Note, It doesn't actually matter what the values inside of the arrays are, it's just the count that matters.

It'd be fine if the resulting data frame looked like this:

+-----+------------------+
|myInt|             myArr|
+-----+------------------+
|    1|            [item]|
|    2|      [item, item]|
|    3|[item, item, item]|
+-----+------------------+

l_po · Accepted Answer · 2021-02-19 13:53:57Z

2

It is preferable to avoid UDFs if possible because they are less efficient. You can use array_repeat instead.

import pyspark.sql.functions as F

x.withColumn('myArr', F.array_repeat(F.col('myInt'), F.col('myInt'))).show()

+-----+---------+
|myInt|    myArr|
+-----+---------+
|    1|      [1]|
|    2|   [2, 2]|
|    3|[3, 3, 3]|
+-----+---------+

answered Feb 19, 2021 at 13:53

l_po

615 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

l_po Over a year ago

Note that I had some issues with it in spark 2.4.4 but it works fine in spark 3.0.1

user9604893 · Accepted Answer · 2018-04-05 23:03:34Z

1

Use udf:

from pyspark.sql.functions import *

@udf("array<int>")
def rep_(x):
    return [x for _ in range(x)]

x.withColumn("myArr", rep_("myInt")).show()
# +-----+------+
# |myInt| myArr|
# +-----+------+
# |    1|   [1]|
# |    2|[2, 2]|
# +-----+------+

answered Apr 5, 2018 at 23:03

user9604893

111 bronze badge

Collectives™ on Stack Overflow

Apache pyspark How to create a column with array containing n elements

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related