4

I am trying to create a DataFrame object for my spam classifier.It's supposed to contain two columns: 'messages' and 'class'. However when I use the dataframe.append function to add emails as 'messages' to my dataframe along with the folder name as 'class', I'm getting this error:

AttributeError: 'DataFrame' object has no attribute 'append'

For this I initially created a Dataframe as follow data = DataFrame({'message': [], 'class': []})

I tried to use the DataFrame.append() function for adding the spam and ham emails to the DataFrame. Here's the code I am using:

data = DataFrame({'message': [], 'class': []})

data = data.append(dataFrameFromDirectory('D:\email_classifier\spam', 'spam'))
data = data.append(dataFrameFromDirectory('D:\email_classifier\ham', 'ham'))

In theory, this should add the emails and the folder name to data. Is there a way to get around this without having to use an older version of pandas?

3
  • use pd.concat instead Commented Apr 15, 2023 at 6:12
  • Actually, I see this question would be worth reopening and adding some information on exactly why the error occurs since presumably people will paste this attribute error into google and hit search and land up here. Commented Apr 15, 2023 at 6:25
  • See also: Create a Pandas Dataframe by appending one row at a time Commented Apr 15, 2023 at 7:39

1 Answer 1

7

pandas >= 2.0: append has been removed, use pd.concat

DataFrame.append was deprecated in version 1.4 and removed from the pandas API entirely in version 2.0

See the docs on Deprecations as well as this github issue that originally proposed its deprecation.

The rationale for its removal was to discourage iteratively growing DataFrames in a loop (which is what people typically use append for). This is because append makes a new copy at each stage, resulting in quadratic complexity in memory.

In the absence of append, if your data is growing rowwise, the right approach is to accumulate it in a list of records (or list of DataFrames) and convert it to one big DataFrame at the end.

accumulator = []
for args in arg_list:
    accumulator.append(dataFrameFromDirectory(*args))

big_df = pd.concat(accumulator)

References:

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.