3

I have built a logistic regression model using spark ml pipelines and saved it. I am trying to apply the pipeline on new set of records and receiving an error. My pipeline has vector assembler, standard scaler and logistic regression model in it.

I tried pipeline.transform and received the below error

AttributeError: 'Pipeline' object has no attribute 'transform'

Below is the code

from pyspark.ml import Pipeline
pipelineModel = Pipeline.load("/user/userid/lr_pipe")
scored_temp = pipelineModel.transform(combined_data_imputed_final)

Here is how I saved my pipeline

from pyspark.ml.classification import LogisticRegression

vector = VectorAssembler(inputCols=final_features, outputCol="final_features")
scaler = StandardScaler(inputCol="final_features", outputCol="final_scaled_features")
lr = LogisticRegression(labelCol="label", featuresCol="final_scaled_features", maxIter=30)

stages = [vector,scaler,lr]

pipe = Pipeline(stages=stages)

lrModel = pipe.fit(train_transformed_data_1).transform(train_transformed_data_1)
pipe.save("lr_pipe")

I am expecting it to complete all the pipeline steps and score the records.

1 Answer 1

7

I had the same issue and after I looked up the source code, and I found there is a PipelineModel module that we should load into it. Once I have changed that, it works :D

from pyspark.ml import Pipeline, PipelineModel
pipelineModel = PipelineModel.load("/user/userid/lr_pipe")
scored_temp = pipelineModel.transform(combined_data_imputed_final)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.