Add new columns and rows

Question

I have PySpark dataframe:

cust |  prob
-------------------
A    |  0.1
B    |  0.7
C    |  0.4

I would like to add another column amount and add rows to each customer. My expected result would be:

cust |  prob  |  amount
------------------------
A    |  0.1   |  1000
A    |  0.1   |  2000
A    |  0.1   |  3000
A    |  0.1   |  4000
A    |  0.1   |  5000
B    |  0.7   |  1000
B    |  0.7   |  2000
B    |  0.7   |  3000
B    |  0.7   |  4000
B    |  0.7   |  5000
C    |  0.4   |  1000
C    |  0.4   |  2000
C    |  0.4   |  3000
C    |  0.4   |  4000
C    |  0.4   |  5000

I need help in making this new column and rows. My real data consist of many columns, so it should duplicate whatever the original column in the dataset.

mck · Accepted Answer · 2021-02-01 13:29:43Z

3

You can add an exploded array:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'amount',
    F.explode(
        F.array(*[F.lit(i) for i in [1000, 2000, 3000, 4000, 5000]])
    )
)

Or an exploded sequence:

df2 = df.withColumn(
    'amount',
    F.explode(
        F.sequence(F.lit(1000), F.lit(5000), F.lit(1000))
    )
)

answered Feb 1, 2021 at 13:29

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Add new columns and rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related