Rename Elements In Pyspark Column

Question

I'm currently trying to rename the elements in my pyspark dataframe. The dataframe df looks like this:

+--------+------+------+
|   hello|  this|column|
+--------+------+------+
|     132|   234|   abc|
|34563465|134134|   def|
|      12|    34|   ghi|
|     132|   234|   jkl|
|34563465|134134|   mno|
|      12|    34|   pqr|
|     132|   234|   stu|
|34563465|134134|   ghi|
|      12|    34|   pqr|
+--------+------+------+

What I am trying to do is rename every element in the 'column' column along the lines of this:

df['column'] = df['column'].map({'abc': 'cba',
                                 'def': 'fed',
                                 'ghi': 'ihg',
                                 'jkl': 'lkj',
                                 'mno': 'onm',
                                 'pqr': 'rqp',
                                 'stu': 'uts'})

So that the dataframe will then look like this:

+--------+------+------+
|   hello|  this|column|
+--------+------+------+
|     132|   234|   cba|
|34563465|134134|   fed|
|      12|    34|   ihg|
|     132|   234|   lkj|
|34563465|134134|   onm|
|      12|    34|   rqp|
|     132|   234|   uts|
|34563465|134134|   ihg|
|      12|    34|   rqp|
+--------+------+------+

How can I do this change in pyspark?

villoro · Accepted Answer · 2020-01-31 20:00:10Z

1

You can do it with the replace function:

mapping = {
    'abc': 'cba',
    'def': 'fed',
    'ghi': 'ihg',
    'jkl': 'lkj',
    'mno': 'onm',
    'pqr': 'rqp',
    'stu': 'uts'
}
df = df.replace(to_replace=mapping, subset=['column'])

answered Jan 31, 2020 at 20:00

villoro

1,5491 gold badge11 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Rename Elements In Pyspark Column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related