3

I am trying to implement Gradient boosting algorithm on a kaggle dataset in pyspark for learning purpose. I am facing error given below

Traceback (most recent call last):
  File "C:/SparkCourse/Gradientboost.py", line 29, in <module>
    output=assembler.transform(data)
  File "C:\spark\python\lib\pyspark.zip\pyspark\ml\base.py", line 105, in transform
  File "C:\spark\python\lib\pyspark.zip\pyspark\ml\wrapper.py", line 281, in _transform
AttributeError: 'OneHotEncoder' object has no attribute '_jdf'

the corresponding code is

from pyspark.sql import SparkSession
from pyspark.ml.feature import StringIndexer,VectorIndexer,OneHotEncoder,VectorAssembler





spark=SparkSession.builder.config("spark.sql.warehouse.dir", "file:///C:/temp").appName("Gradientboostapp").enableHiveSupport().getOrCreate()
data= spark.read.csv("C:/Users/codemen/Desktop/Timeseries Analytics/liver_patient.csv",header=True, inferSchema=True)
#data.show()
print(data.count())
#data.printSchema()
print("After deleting  null  values")

data=data.na.drop()
print(data.count())

data=StringIndexer(inputCol="Gender",outputCol="GenderIndex").fit(data)


#let onehot encode the data

data=OneHotEncoder(inputCol="GenderIndex",outputCol="gendervec")


usedfeature=["Age","gendervec","Total_Bilirubin","Direct_Bilirubin","Alkaline_Phosphotase","Alamine_Aminotransferase","Aspartate_Aminotransferase","Total_Protiens","Albumin","Albumin_and_Globulin_Ratio"]
#
assembler=VectorAssembler(inputCols=usedfeature,outputCol="features")
output=assembler.transform(data)
output.select("features","category").show()

I have converted Gender category into numerical form by using String indexer then I have tried to perform OnehotEncoding on Genderindex value. I am getting the error when I have performed VectorAssembler in code. May I am missing very silly concept here. kindly help me to figure it out

1 Answer 1

3

This line of code is incorrect: data=OneHotEncoder(inputCol="GenderIndex",outputCol="gendervec"). You are setting data to be equal to the OneHotEncoder() object, not transforming the data. You need to call a transform to encode the data. It should look like this.

encoder=OneHotEncoder(inputCol="GenderIndex",outputCol="gendervec") data = encoder.transform(data)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.