2

I would like to replace a column of pyspark dataframe.

the dataframe:

   price
   90.16|USD  

I need:

  dollar_price currency
  9016          USD

Pyspark code:

  new_col = F.when(F.col("price").isNull() == False, F.substring(F.col('price'), 1, F.instr(F.col('retail_value'), '|')-1)).otherwise(null)


   new_df = df.withColumn('dollar_price', new_col)

   new_col = F.when(F.col("price").isNull() == False, F.substring(F.col('price'), F.instr(F.col('retail_value'), '|')+1, 3)).otherwise(null)


   new_df_1 = new_df.withColumn('currency', new_col)

I got error:

  TypeError: Column is not iterable

Could you please tell me what I missed ?

I have tried Split a dataframe column's list into two dataframe columns

but it does not work.

thanks

1
  • why can't you use df.selectExpr("split(price, '|')[0] as dollar_price", "split(price, '|')[1] as currency")? Commented Jul 29, 2020 at 6:36

1 Answer 1

2

Try with expr as you are computing value from instr function.

Example:

df.show()
#+---------+
#|    price|
#+---------+
#|90.16|USD|
#+---------+

from pyspark.sql.functions import *
from pyspark.sql.types import *

df.withColumn("dollar_price",when(col("price").isNull()==False,expr("substring(price,1,instr(price,'|')-1)")).otherwise(None)).\
withColumn("currency",when(col("price").isNull()==False,expr("substring(price,instr(price,'|')+1,3)")).otherwise(None)).\
show()

#+---------+------------+--------+
#|    price|dollar_price|currency|
#+---------+------------+--------+
#|90.16|USD|       90.16|     USD|
#+---------+------------+--------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.