0

This is how I do do for 2 specific columns:

dataSet.withColumn("colName", concat(dataSet.col("col1"), lit(","),dataSet.col("col2") ));

but dataSet.columns() retruns Sting array, and not Column array. How should I craete a List<Column>?

Thanks!

3 Answers 3

5

Simple Way - Instead of df.columns use concat_ws(",","*"), Check below code.

df.withColumn("colName",expr("concat_ws(',',*)")).show(false)
+---+--------+---+-------------+
|id |name    |age|colName      |
+---+--------+---+-------------+
|1  |Srinivas|29 |1,Srinivas,29|
|2  |Ravi    |30 |2,Ravi,30    |
+---+--------+---+-------------+
Sign up to request clarification or add additional context in comments.

Comments

0

Java has more verbose syntax. Try this -

 df.withColumn("colName",concat_ws(",", toScalaSeq(Arrays.stream(df.columns()).map(functions::col).collect(Collectors.toList()))));

Use below utility to convert java list to scala seq-

  <T> Buffer<T> toScalaSeq(List<T> list) {
        return JavaConversions.asScalaBuffer(list);
    }

Comments

0

If someone is looking for a way to concat all the columns of a DataFrame in Scala, this is what worked for me:

val df_new = df.withColumn(new_column_name, concat_ws("-", df.columns.map(col): _*))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.