3

I am using Spark v3.0.0. My dataframe is:

indexer.show()
+------+--------+-----+
|row_id|    city|index|
+------+--------+-----+
|     0|New York|  0.0|
|     1|  Moscow|  3.0|
|     2| Beijing|  1.0|
|     3|New York|  0.0|
|     4|   Paris|  2.0|
|     5|   Paris|  2.0|
|     6|New York|  0.0|
|     7| Beijing|  1.0|
+------+--------+-----+

Then I want to use One hot encoding of the dataframe's column "index" and getting this error.

encoder = OneHotEncoder(inputCol="index", outputCol="encoding")
encoder.setDropLast(False)
indexer = encoder.transform(indexer)
----------------------------------------
AttributeErrorTraceback (most recent call last)
<ipython-input-32-70bbd67e6679> in <module>
      1 encoder = OneHotEncoder(inputCol="index", outputCol="encoding")
      2 encoder.setDropLast(False)
----> 3 indexer = encoder.transform(indexer)

AttributeError: 'OneHotEncoder' object has no attribute 'transform'

0

1 Answer 1

3

You need to fit it first - before fitting, the attribute does not exist indeed:

encoder = OneHotEncoder(inputCol="index", outputCol="encoding")
encoder.setDropLast(False)
ohe = encoder.fit(indexer) # indexer is the existing dataframe, see the question
indexer = ohe.transform(indexer)

See the example in the docs for more details on the usage.

Sign up to request clarification or add additional context in comments.

3 Comments

You use indexer befoer you define it, also link is broken; had same issue
@DimitriosMistriotis Thanks Δημήτρη. I fixed the link; indexer is the already existing dataframe (see the question), so here I am simply overwriting it (nowadays I don't like the practice of overwriting though).
Thanks, we were following a tutorial :) and fell in a loop. Now we can try again :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.