Add identical rows to a Spark Dataframe using an integer

Question

Assuming the following Dataframe df1 :

df1 :
+---------+--------+-------+
|A        |B       |C      |
+---------+--------+-------+
|toto     |tata    |titi   |
+---------+--------+-------+

I have the N = 3 integer which I want to use in order to create 3 duplicates in the df2 Dataframe using df1 :

df2 :
+---------+--------+-------+
|A        |B       |C      |
+---------+--------+-------+
|toto     |tata    |titi   |
|toto     |tata    |titi   |
|toto     |tata    |titi   |
+---------+--------+-------+

Any ideas ?

Does this answer your question? Replicate Spark Row N-times

Duelist
– Duelist

2020-05-06 16:24:24 +00:00
Commented May 6, 2020 at 16:24 — Duelist
– Duelist, Commented May 6, 2020 at 16:24

notNull · Accepted Answer · 2020-05-06 17:01:01Z

1

From Spark-2.4+ use arrays_zip + array_repeat + explode functions for this case.

val df=Seq(("toto","tata","titi")).toDF("A","B","C")
df.withColumn("arr",explode(array_repeat(arrays_zip(array("A"),array("B"),array("c")),3))).
drop("arr").
show(false)

//or dynamic way
val cols=df.columns.map(x => col(x))
df.withColumn("arr",explode(array_repeat(arrays_zip(array(cols:_*)),3))).
drop("arr").
show(false)

//+----+----+----+
//|A   |B   |C   |
//+----+----+----+
//|toto|tata|titi|
//|toto|tata|titi|
//|toto|tata|titi|
//+----+----+----+

edited May 6, 2020 at 17:01

answered May 6, 2020 at 16:51

notNull

31.8k4 gold badges41 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

QuickSilver · Accepted Answer · 2020-05-06 17:27:21Z

1

You can use foldLeft along with Dataframe's union

import org.apache.spark.sql.DataFrame

object JoinDataFrames {

  def main(args: Array[String]): Unit = {
    val spark = Constant.getSparkSess
    import spark.implicits._
    val df = List(("toto","tata","titi")).toDF("A","B","C")

    val N = 3;

    val resultDf = (1 until N).foldLeft( df)((dfInner : DataFrame, count : Int) => {
      df.union(dfInner)
    })

    resultDf.show()

  }

}

answered May 6, 2020 at 17:27

QuickSilver

4,0452 gold badges15 silver badges31 bronze badges

Collectives™ on Stack Overflow

Add identical rows to a Spark Dataframe using an integer

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related