5

I have a dataframe with multiple columns:

| a | b | c | d |
-----------------
| 0 | 4 | 3 | 6 |
| 1 | 7 | 0 | 4 |
| 2 | 4 | 3 | 6 |
| 3 | 9 | 5 | 9 |

I would now like to combine [b,c,d] into a single column. However, I do not know, how big the list of columns will be, otherwise I could just use a UDF3 to combine the three.

So the desired outcome is:

| a | combined  |
-----------------
| 0 | [4, 3, 6] |
| 1 | [7, 0, 4] |
| 2 | [4, 3, 6] |
| 3 | [9, 5, 9] |

How can I achieve this?

Non-working pseudo-code:

public static Dataset<Row> mergeColumns(Dataset<Row> ds, List<String> columns) {
   return ds.withColumn("combined", collectAsList(columns))
}

Worst-case workaround would be a switch statement on the number of input columns and then write a UDF each for, i.e. 2-20 input columns and throw an error, if more input columns are supplied.

1
  • 3
    use array inbuilt function Commented Jul 5, 2018 at 7:09

2 Answers 2

5

As Ramesh mentioned in his comment, you can use array function. You only need to convert your columns list to Column array.

public static Dataset<Row> mergeColumns(Dataset<Row> ds, List<String> columns) {
    return ds.withColumn("combined", functions.array(columns.stream().map(functions::col).toArray(Column[]::new)))
}
Sign up to request clarification or add additional context in comments.

Comments

0
val newDF = df.drop("a")

//create a new dataframe to get the combination of columns except the first column
val df2 = df.withColumn("combined", concat_ws(",",newDF.columns.map(collist => 
col(collist)): _*))

//new dataframe to select only the first and combined columns
val columnNames = Seq("a", "combined")
df2.select(columnNames.head, columnNames.tail:_*).show()

1 Comment

That would result in a column of StringType, rather than an array, or?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.