6

I have just started learning Spark. Currently, I am trying to perform One hot encoding on a single column from my dataframe. However I cannot import the OneHotEncoderEstimator from pyspark. I have try to import the OneHotEncoder (depacated in 3.0.0), spark can import it but it lack the transform function. Here is the output from my code below. If anyone has encountered similar problem, please help. Thank you so much for your time!!

enter image description here

2
  • 3
    Please avoid screenshots on SO. Commented Jan 27, 2020 at 7:56
  • What @cronoik said Commented May 11, 2021 at 4:59

2 Answers 2

11

In addition to Ulgen, OneHotEncoderEstimator has been renamed to OneHotEncoder from spark version 2.4 onwards.

Sign up to request clarification or add additional context in comments.

Comments

4

Your first problem is that encoder object has no 'transform' error. This is a category indexer. Before you can transform columns of object, you must train a OneHotEncoderEstimator using fit() function. In that way your encoder object will learn from data and will be able to transfer the data to encoded category vectors. Most of the category indexer models requires fit() function to learn from data itself.

so what you should do is

encoder = OneHotEncoderEstimator(dropLast=False, inputCol:"AgeIndex", outputCol="AgeVec"
model = encoder.fit(df)
encoded = model.transform(df)
encoded.show()

Also I recommend you to read documentation before starting to a project if you are new to something, documentation helps a lot. The section of spark that includes transformation operations posted here as a link.

Spark Transformation Operations

your second problem is import error, since you are using notebook I suggest you should check your notebook's environment. But your version is preview version which mostly considers the developers and tester. For starters one should always go for the latest tested release. Try to switch back to spark-2.4.4 and check the notebook's environment.

1 Comment

Thank you so much for you quick reply! I did read the OneHotEncoder example from spark github repo but I feed in the wrong one so it is bugged.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.