How to convert DataFrame.append() to pandas.concat()? [duplicate]

Question

In pandas 1.4.0: append() was deprecated, and the docs say to use concat() instead.

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Codeblock in question:

def generate_features(data, num_samples, mask):
    """
    The main function for generating features to train or evaluate on.
    Returns a pd.DataFrame()
    """
    logger.debug("Generating features, number of samples", num_samples)
    features = pd.DataFrame()

    for count in range(num_samples):
        row, col = get_pixel_within_mask(data, mask)
        input_vars = get_pixel_data(data, row, col)
        features = features.append(input_vars)
        print_progress(count, num_samples)

    return features

These are the two options I've tried, but did not work:

features = pd.concat([features],[input_vars])

and

pd.concat([features],[input_vars])

This is the line that is deprecated and throwing the error:

features = features.append(input_vars)

fantabolous · Accepted Answer · 2022-09-12 07:20:10Z

18

You can store the DataFrames generated in the loop in a list and concatenate them with features once you finish the loop.

In other words, replace the loop:

for count in range(num_samples):
    # .... code to produce `input_vars`
    features = features.append(input_vars)        # remove this `DataFrame.append`

with the one below:

tmp = []                                  # initialize list
for count in range(num_samples):
    # .... code to produce `input_vars`
    tmp.append(input_vars)                        # append to the list, (not DF)
features = pd.concat(tmp)                         # concatenate after loop

You can certainly concatenate in the loop but it's more efficient to do it only once.

edited Sep 12, 2022 at 7:20

fantabolous

22.9k8 gold badges58 silver badges52 bronze badges

answered Feb 24, 2022 at 21:43

user7864386

Sign up to request clarification or add additional context in comments.

2 Comments

fantabolous Over a year ago

From personal experience, each append can individually take almost as long as the entire concat, so the time savings by doing it once at the end can be massive.

Jamie Over a year ago

It is very unfortunate that they are deprecating append for dataframes. With my code, creating the dataframe using the temporary list as shown here results in my code running 10X slower.

ArchAngelPwn · Accepted Answer · 2022-02-24 21:50:46Z

5

This will "append" the blank df and prevent errors in the future by using the concat option

features= pd.concat([features, input_vars])

However, still, without having access to actually data and data structures this would be hard to test replicate.

edited Feb 24, 2022 at 21:50

answered Feb 24, 2022 at 21:39

ArchAngelPwn

3,0461 gold badge6 silver badges17 bronze badges

3 Comments

Stephen Stilwell Over a year ago

On the official Pandas docs for the latest release, you will see that .append() was deprecated. pandas.pydata.org/docs/whatsnew/v1.4.0.html They say I should use concat() instead, but I can't get it to work. I will keep exploring the pandas docs.

ArchAngelPwn Over a year ago

I updated my answer to use the concat thank you for pointing out the docs sorry if I missed them before

fantabolous Over a year ago

This directly fixes the op's mistake e.g. [features],[input_vars] should be [features, input_vars]. However in the case of a loop like the op, the other answer is far more efficient.

MightyCurious · Accepted Answer · 2023-04-25 15:20:25Z

1

There is another unpleasant edge case here: If input_vars is a series (not a dataframe) that represents one row to be appended to features, the deprecated use of features = features.append(input_vars) works fine and adds one row to the dataframe.

But the version with concat features = pd.concat([features, input_vars]) does something different and produces lots of NaNs. To get this to work, you need to convert the series to a dataframe:

features = pd.concat([features, input_vars.to_frame().T])

See also this question: Why does concat Series to DataFrame with index matching columns not work?

answered Apr 25, 2023 at 15:20

MightyCurious

89112 silver badges23 bronze badges

Comments

Lucy Nowacki · Accepted Answer · 2023-02-18 12:11:35Z

0

For example, you have a list of dataframes called collector, e.g. for cryptocurrencies, and you want to harvest first rows from two particular columns from each datafarme in our 'collector'. You do as follows

pd.concat([cap[['Ticker', 'Market Cap']].iloc[:1] for cap in collector] )

answered Feb 18, 2023 at 12:11

Lucy Nowacki

773 bronze badges

Comments

Archimedes Trajano · Accepted Answer · 2023-07-10 22:17:02Z

You can bring it back by creating a module

import pandas as pd


def my_append(self, x, ignore_index=False):
    if ignore_index:
        return pd.concat([self, x])
    else:
        return pd.concat([self, x]).reset_index(drop=True)


if not hasattr(pd.DataFrame, "append"):
    setattr(pd.DataFrame, "append", my_append)

This will add the implementation and can be tested as follows

import pandas as pd
import lib.pandassupport


def test_append_ignore_index_is_true():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row, ignore_index=True)
    print(df)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 0],
        )
    )


def test_append():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 3],
        )
    )

Collectives™ on Stack Overflow

How to convert DataFrame.append() to pandas.concat()? [duplicate]

5 Answers 5

2 Comments

3 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

3 Comments

Comments

Comments

Comments

Linked

Related