21

In pandas 1.4.0: append() was deprecated, and the docs say to use concat() instead.

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Codeblock in question:

def generate_features(data, num_samples, mask):
    """
    The main function for generating features to train or evaluate on.
    Returns a pd.DataFrame()
    """
    logger.debug("Generating features, number of samples", num_samples)
    features = pd.DataFrame()

    for count in range(num_samples):
        row, col = get_pixel_within_mask(data, mask)
        input_vars = get_pixel_data(data, row, col)
        features = features.append(input_vars)
        print_progress(count, num_samples)

    return features

These are the two options I've tried, but did not work:

features = pd.concat([features],[input_vars])

and

pd.concat([features],[input_vars])

This is the line that is deprecated and throwing the error:

features = features.append(input_vars)
0

5 Answers 5

18

You can store the DataFrames generated in the loop in a list and concatenate them with features once you finish the loop.

In other words, replace the loop:

for count in range(num_samples):
    # .... code to produce `input_vars`
    features = features.append(input_vars)        # remove this `DataFrame.append`

with the one below:

tmp = []                                  # initialize list
for count in range(num_samples):
    # .... code to produce `input_vars`
    tmp.append(input_vars)                        # append to the list, (not DF)
features = pd.concat(tmp)                         # concatenate after loop

You can certainly concatenate in the loop but it's more efficient to do it only once.

Sign up to request clarification or add additional context in comments.

2 Comments

From personal experience, each append can individually take almost as long as the entire concat, so the time savings by doing it once at the end can be massive.
It is very unfortunate that they are deprecating append for dataframes. With my code, creating the dataframe using the temporary list as shown here results in my code running 10X slower.
5

This will "append" the blank df and prevent errors in the future by using the concat option

features= pd.concat([features, input_vars])

However, still, without having access to actually data and data structures this would be hard to test replicate.

3 Comments

On the official Pandas docs for the latest release, you will see that .append() was deprecated. pandas.pydata.org/docs/whatsnew/v1.4.0.html They say I should use concat() instead, but I can't get it to work. I will keep exploring the pandas docs.
I updated my answer to use the concat thank you for pointing out the docs sorry if I missed them before
This directly fixes the op's mistake e.g. [features],[input_vars] should be [features, input_vars]. However in the case of a loop like the op, the other answer is far more efficient.
1

There is another unpleasant edge case here: If input_vars is a series (not a dataframe) that represents one row to be appended to features, the deprecated use of features = features.append(input_vars) works fine and adds one row to the dataframe.

But the version with concat features = pd.concat([features, input_vars]) does something different and produces lots of NaNs. To get this to work, you need to convert the series to a dataframe:

features = pd.concat([features, input_vars.to_frame().T])

See also this question: Why does concat Series to DataFrame with index matching columns not work?

Comments

0

For example, you have a list of dataframes called collector, e.g. for cryptocurrencies, and you want to harvest first rows from two particular columns from each datafarme in our 'collector'. You do as follows

pd.concat([cap[['Ticker', 'Market Cap']].iloc[:1] for cap in collector] )

Comments

0

You can bring it back by creating a module

import pandas as pd


def my_append(self, x, ignore_index=False):
    if ignore_index:
        return pd.concat([self, x])
    else:
        return pd.concat([self, x]).reset_index(drop=True)


if not hasattr(pd.DataFrame, "append"):
    setattr(pd.DataFrame, "append", my_append)

This will add the implementation and can be tested as follows

import pandas as pd
import lib.pandassupport


def test_append_ignore_index_is_true():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row, ignore_index=True)
    print(df)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 0],
        )
    )


def test_append():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 3],
        )
    )

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.