0

I have a condition where I have to add 100 columns (to an existing DF)

The existing DF is like this

c1 c2 c3 c4

The 100 columns should be added after c2 so the output looks likes this

c1 c2 c5 c6 c7 c8 c9 ...... c100 c3 c4

I used the .withColumn to add the columns and arrange them in order using .select

Is there a better way to do this?

TIA

2
  • What do you mean by better? More readable? More performant? Commented Jul 27, 2021 at 18:28
  • There is not enough information in question to answer. can you show us what do you want to achieve, what is this column contains, what have you tried so far? Commented Jul 27, 2021 at 18:59

1 Answer 1

5

Use select. The withColumn approach gets less efficient with a lot of columns.

val midCols: Seq[Column] = ...
df.select(Seq(c1, c2) ++ midCols ++ Seq(c3, c4):_*)

That way the new version of the Dataset is created in a single transformation.

Good info for others reading this is that you can use that pattern to construct and add complex column expressions.

val colsToAdd: Seq[Column] = Seq('a * 2 as "next_a", split('b) as "b_arr")

Given the naming/alias is within the definitions already, when you select them, they will have the intended name. That way you don't have to deal with a rename after.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.