2

Assuming the following Dataframe df1 :

df1 :
+---------+--------+-------+
|A        |B       |C      |
+---------+--------+-------+
|toto     |tata    |titi   |
+---------+--------+-------+

I have the N = 3 integer which I want to use in order to create 3 duplicates in the df2 Dataframe using df1 :

df2 :
+---------+--------+-------+
|A        |B       |C      |
+---------+--------+-------+
|toto     |tata    |titi   |
|toto     |tata    |titi   |
|toto     |tata    |titi   |
+---------+--------+-------+

Any ideas ?

1

2 Answers 2

1

From Spark-2.4+ use arrays_zip + array_repeat + explode functions for this case.

val df=Seq(("toto","tata","titi")).toDF("A","B","C")
df.withColumn("arr",explode(array_repeat(arrays_zip(array("A"),array("B"),array("c")),3))).
drop("arr").
show(false)

//or dynamic way
val cols=df.columns.map(x => col(x))
df.withColumn("arr",explode(array_repeat(arrays_zip(array(cols:_*)),3))).
drop("arr").
show(false)

//+----+----+----+
//|A   |B   |C   |
//+----+----+----+
//|toto|tata|titi|
//|toto|tata|titi|
//|toto|tata|titi|
//+----+----+----+
Sign up to request clarification or add additional context in comments.

Comments

1

You can use foldLeft along with Dataframe's union

import org.apache.spark.sql.DataFrame

object JoinDataFrames {

  def main(args: Array[String]): Unit = {
    val spark = Constant.getSparkSess
    import spark.implicits._
    val df = List(("toto","tata","titi")).toDF("A","B","C")

    val N = 3;

    val resultDf = (1 until N).foldLeft( df)((dfInner : DataFrame, count : Int) => {
      df.union(dfInner)
    })

    resultDf.show()

  }

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.