I am new with Pyspark in Databricks so thats why im struggeling with the following: I have a dataframe that has columns with datatype string. One example (see image below) is the column netto_resultaat.
The requirement is to change this column and other financial related columns from a string to a decimal. This because i want to export the schema and data in a Azure SQL database.
i tried the following:
from pyspark.sql.functions import col from pyspark.sql.types import DecimalType newouterjoindffinal = newouterjoindf.withColumn("netto_resultaat",col("netto_resultaat").cast(DecimalType())) newouterjoindffinal.printSchema()
The result is that the numbers in column netto_resultaat are converted as null. there are records that have data like -100,880 , 35,312,000 or 118,579,525.
Any suggestions?
