0

I have a PySpark dataframe that has a couple of fields, e.g.:

Id Name Surname
1 John Johnson
2 Anna Maria

I want to create a new column that would mix the values of other comments into a new string. Desired output is:

Id Name Surname New
1 John Johnson Hey there John Johnson!
2 Anna Maria Hey there Anna Maria!

I'm trying to do (pseudocode):

df = df.withColumn("New", "Hey there " + Name + " " + Surname + "!")

How can this be achieved?

1
  • wrap the literal values in lit() and the column names in col(). concatenation can be done using concat(). see func doc for more details. Commented Aug 3, 2022 at 16:51

1 Answer 1

4

You can use concat function or format_string like this:

from pyspark.sql import functions as F

df = df.withColumn(
    "New", 
    F.format_string("Hey there %s %s!", "Name", "Surname")
)

df.show(truncate=False)
# +---+----+-------+-----------------------+
# |Id |Name|Surname|New                    |
# +---+----+-------+-----------------------+
# |1  |John|Johnson|Hey there John Johnson!|
# |2  |Anna|Maria  |Hey there Anna Maria!  |
# +---+----+-------+-----------------------+

If you prefer using concat:

F.concat(F.lit("Hey there "), F.col("Name"), F.lit(" "), F.col("Surname"), F.lit("!"))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.