0

How we can split sparkDataframe column based on some separator like '/' using pyspark2.4

My Column contains :

+-------------------+
|           timezone|
+-------------------+
|   America/New_York|
|  Africa/Casablanca|
|      Europe/Madrid|
|      Europe/Madrid|
|                   |
|   Null            |

Thanks

1
  • What output do you expect? Two separate columns if the separator exist? Commented Feb 21, 2019 at 13:47

1 Answer 1

2
# Creating a dataframe
values = [('America/New_York',),('Africa/Casablanca',),('Europe/Madrid',),('Europe/Madrid',),('Germany',),('',),(None,)]
df = sqlContext.createDataFrame(values,['timezone',])
df.show(truncate=False)
+-----------------+
|timezone         |
+-----------------+
|America/New_York |
|Africa/Casablanca|
|Europe/Madrid    |
|Europe/Madrid    |
|Germany          |
|                 |
|null             |
+-----------------+

from pyspark.sql.functions import instr, split
df = df.withColumn('separator_if_exists',(instr(col('timezone'),'/') > 0) & instr(col('timezone'),'/').isNotNull())
df = df.withColumn('col1',when(col('separator_if_exists') == True,split(col('timezone'),'/')[0]).otherwise(None))
df = df.withColumn('col2',when(col('separator_if_exists') == True,split(col('timezone'),'/')[1]).otherwise(None)).drop('separator_if_exists')
df.show(truncate=False)

+-----------------+-------+----------+
|timezone         |col1   |col2      |
+-----------------+-------+----------+
|America/New_York |America|New_York  |
|Africa/Casablanca|Africa |Casablanca|
|Europe/Madrid    |Europe |Madrid    |
|Europe/Madrid    |Europe |Madrid    |
|Germany          |null   |null      |
|                 |null   |null      |
|null             |null   |null      |
+-----------------+-------+----------+

Documentation: split() and instr(). Note that instr() is 1 based indexing. If the substring to be searched is not found, 0 is returned.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.