Error: AttributeError: 'DataFrame' object has no attribute '_jdf'

Question

I want to perform k-fold cross validation using pyspark to finetune the parameters and I'm using pyspark.ml. I am getting Attribute Error.

AttributeError: 'DataFrame' object has no attribute '_jdf'

I have tried initially using pyspark.mllib but was not able to succeed in performing k-fold cross validation

import pandas as pd
from pyspark import SparkConf, SparkContext
from pyspark.ml.classification import DecisionTreeClassifier

data=pd.read_csv("file:///SparkCourse/wdbc.csv", header=None)
type(data)
print(data)

conf = SparkConf().setMaster("local").setAppName("SparkDecisionTree")
sc = SparkContext(conf = conf)

# Create initial Decision Tree Model
dt = DecisionTreeClassifier(labelCol="label", featuresCol="features", 
maxDepth=3)

# Train model with Training Data
dtModel = dt.fit(data)

# I expect the model to be trained but I'm getting the following error 
AttributeError: 'DataFrame' object has no attribute '_jdf'

Note: I'm able to print the data. Error is in dtModel

You will need to convert the pandas dataframe to a spark dataframe — sramalingam24
– sramalingam24, Commented Apr 10, 2019 at 6:22
In case it helps someone. This error can also be thrown if you've converted the DataFrame to pandas for display after loading it. For example, by using df.limit(5).toPandas(). — Jakob
– Jakob, Commented Jan 26, 2022 at 20:11

asmgx · Accepted Answer · 2020-06-02 22:51:18Z

19

Convert Panadas to Spark

from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

spark_dff = sqlContext.createDataFrame(panada_df)

answered Jun 2, 2020 at 22:51

asmgx

8,15019 gold badges102 silver badges185 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user23487285 · Accepted Answer · 2024-02-27 00:16:15Z

1

I just want to share my experience with this error here. In my case, I had a loop and in some iterations the dataset was just a string because it was empty. When I handle the empty datsts using a 'if' my problem solved. Thanks

answered Feb 27, 2024 at 0:16

user23487285

111 bronze badge

1 Comment

Community Over a year ago

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

kevin_theinfinityfund · Accepted Answer · 2020-12-06 07:13:00Z

0

If a metric evaluation error you probably:

Transformed using Spark on test set properly, then peeked using Pandas DF.

# Spark model, transformed test, converted to pandas df
predictions = model.transform(test)
predDF = predictions.toPandas()
predDF.head()

Then tried:

eval_acc = MulticlassClassificationEvaluator(
            labelCol='Label_index',
            predictionCol='prediction',
            metricName='accuracy'
)

# Evaluate Performance
acc = eval_acc.evaluate(predDF) # Error
print(f"accuracy: {acc}")

I forgot predDF is a Pandas DataFrame. Needed predictions because its a Spark Dataframe.

acc = eval_acc.evaluate(predictions) # Works
print(f"accuracy: {acc}")

edited Dec 6, 2020 at 7:13

answered Dec 6, 2020 at 7:02

kevin_theinfinityfund

2,20519 silver badges19 bronze badges

Comments

Valmir Júnior · Accepted Answer · 2022-06-26 17:14:37Z

0

I think it's because you need to use: spark.read, try this:

data = spark.read.option("header", True).csv(
 "file:///SparkCourse/wdbc.csv"
)

answered Jun 26, 2022 at 17:14

Valmir Júnior

15 bronze badges

Collectives™ on Stack Overflow

Error: AttributeError: 'DataFrame' object has no attribute '_jdf'

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related