0

I'd like to convert a float to a currency using Babel and PySpark

sample data:

amount       currency
2129.9       RON
1700         EUR
1268         GBP
741.2        USD
142.08091153 EUR
4.7E7        USD
0            GBP

I tried:

df = df.withColumn(F.col('amount'), format_currency(F.col('amount'), F.col('currency'),locale='be_BE'))

or

df = df.withColumn(F.col('amount'), format_currency(F.col('amount'), 'EUR',locale='be_BE'))

They both give me an error:
enter image description here

6
  • Can you provide sample data of amount column and also since the error is because of the decimal conversion so you can check check if there is any / in the entire column of amount. Commented Feb 2, 2021 at 6:59
  • So you are saying that there is no other characters in the entire amount column apart from numeric? As there is no other reason why it will have a decimal conversion error Commented Feb 2, 2021 at 7:36
  • I checked the table end there's an E in the column e.g. 4.7E7 Commented Feb 2, 2021 at 7:39
  • remove those characters and try again. Also to check whether the rows are numeric or integer or not, you can use df.select("amount",F.col("amount").cast("int").isNotNull().alias("Truth")).show() Commented Feb 2, 2021 at 7:50
  • What's the best way to remove those characters regexp_extract ? Commented Feb 2, 2021 at 7:58

2 Answers 2

1

To use Python libraries with Spark dataframes, you need to use an UDF:

from babel.numbers import format_currency
import pyspark.sql.functions as F

format_currency_udf = F.udf(lambda a, c: format_currency(a, c))

df2 = df.withColumn(
    'amount',
    format_currency_udf('amount', 'currency')
)

df2.show()
+----------------+--------+
|          amount|currency|
+----------------+--------+
|     RON2,129.90|     RON|
|       €1,700.00|     EUR|
|       £1,268.00|     GBP|
|       US$741.20|     USD|
|         €142.08|     EUR|
|US$47,000,000.00|     USD|
+----------------+--------+
Sign up to request clarification or add additional context in comments.

2 Comments

Works for most values but still having InvalidOperation: [<class 'decimalConversionSyntax'>] error. I checked all values but i can't find the cause of this. How can I best find the value causing the error? Some values are 0 can this cause the problem ?
@JohnDoe it will fail for nulls. you can add a check in the udf to catch this.
1

There seems a problem in pre-processing the amount column of your dataframe. From the error it is evident that value after converting to string is not just numeric which it has to be according to this tableand has has some additional characters as well. You can check on this column to find that and remove unnecessary character to fix this. As as example:

>>> import decimal
>>> value = '10.0'
>>> value = decimal.Decimal(str(value))
>>> value
Decimal('10.0')
>>> value = '10.0e'
>>> value = decimal.Decimal(str(value))
Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    value = decimal.Decimal(str(value))
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]   # as '10.0e' is not just numeric

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.