0

I´m dealing with an encoding problem, almost resolved with the decode/encode of a needed field in a dataframe, as the following example:

df.withColumn("column1", decode(encode("column1", "windows-1252"), "UTF8"))

Getting the values from this enter image description here to enter image description here, as example.

However in some special cases as, "Á" or "Í", I can't get the expected result:

From this enter image description here to this enter image description here

Anyone dealing with the same problems and getting good results with other solution?

Thanks in advance!

1 Answer 1

2

I resolve this problem changing the encode to iso-8859-15. And modifying the load of the data also to this encode type as the example below:

df = (
spark.read.format("com.databricks.spark.xml")
.option("encoding", "UTF-8")
.option("charset", "iso-8859-15")
.option("rowTag", "Header")
.load(folder_path)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.