25

I have a dataframe of data that I am trying to append to another dataframe. I have tried various ways with .append() and there has been no successful way. When I print the data from iterrows, I provide 2 ways I tried to solve the issue below: one creates an error, the other doesn't populate the dataframe with anything.

The workflow I am trying to create is create a dataframe based off of a file that contains transaction history of customer orders. I only want to create a single record per order and I am going to add other logic to update the order details based on updates in the history. By the end of the script, it will have a single record for all of the orders and the end state of those orders after iterating through the history file.

class order_manager():
    """Manages over the current state of orders"""
    
    def __init__(self,dataF, desc='NONE'):
        self.df = pd.DataFrame
        self.data = dataF
        print type(dataF)
        self.oD= self.df(data=None,columns=desc)
    
    def add_data(self,df):
        for i, row in self.data.iterrows():
            print 'row '+str(row)
            print type(row)
            df.append(self.data[i], ignore_index =True) """ This line creates and error"""
            df.append(row, ignore_index =True) """This line doesn't append anything to the dataframe."""

test = order_manager(body,header)
test.add_data(test.orderData)
3
  • I think you just want to pd.concat([df1, df2]). No need to iteratively append one line at a time. Commented Jun 22, 2015 at 20:56
  • Have you looked at the examples in append? Commented Jun 22, 2015 at 21:11
  • Note for 2022: append is deprecated in favor of concat Commented Mar 9, 2022 at 15:41

2 Answers 2

19

Use .loc to enlarge the current df. See the example below.

import pandas as pd
import numpy as np

date_rng = pd.date_range('2015-01-01', periods=200, freq='D')

df1 = pd.DataFrame(np.random.randn(100, 3), columns='A B C'.split(), index=date_rng[:100])
Out[410]: 
                 A       B       C
2015-01-01  0.2799  0.4416 -0.7474
2015-01-02 -0.4983  0.1490 -0.2599
2015-01-03  0.4101  1.2622 -1.8081
2015-01-04  1.1976 -0.7410  0.4221
2015-01-05  1.3311  1.0399  2.2701
...            ...     ...     ...
2015-04-06 -0.0432  0.6131 -0.0216
2015-04-07  0.4224 -1.1565  2.2285
2015-04-08  0.0663  1.2994  2.0322
2015-04-09  0.1958 -0.4412  0.3924
2015-04-10  0.1622  1.7603  1.4525

[100 rows x 3 columns]


df2 = pd.DataFrame(np.random.randn(100, 3), columns='A B C'.split(), index=date_rng[100:])
Out[411]: 
                 A       B       C
2015-04-11  1.1196 -1.9627  0.6615
2015-04-12 -0.0098  1.7655  0.0447
2015-04-13 -1.7318 -2.0296  0.8384
2015-04-14 -1.5472 -1.7220 -0.3166
2015-04-15  2.5058  0.6487  1.0994
...            ...     ...     ...
2015-07-15 -1.4803  2.1703 -1.9391
2015-07-16 -1.7595 -1.7647 -1.0622
2015-07-17  1.7900  0.2280 -1.8797
2015-07-18  0.7909 -0.4999  0.3848
2015-07-19  1.2243  0.4681 -1.2323

[100 rows x 3 columns]

# to move one row from df2 to df1, use .loc to enlarge df1
# this is far more efficient than pd.concat and pd.append
df1.loc[df2.index[0]] = df2.iloc[0]

Out[413]: 
                 A       B       C
2015-01-01  0.2799  0.4416 -0.7474
2015-01-02 -0.4983  0.1490 -0.2599
2015-01-03  0.4101  1.2622 -1.8081
2015-01-04  1.1976 -0.7410  0.4221
2015-01-05  1.3311  1.0399  2.2701
...            ...     ...     ...
2015-04-07  0.4224 -1.1565  2.2285
2015-04-08  0.0663  1.2994  2.0322
2015-04-09  0.1958 -0.4412  0.3924
2015-04-10  0.1622  1.7603  1.4525
2015-04-11  1.1196 -1.9627  0.6615

[101 rows x 3 columns]
Sign up to request clarification or add additional context in comments.

1 Comment

not sure how much it's actually more efficient than pandas' append, both seem very slow to me
0

Pandas dataframes are not meant to be grown vertically in-place. There's a reason why it's incredibly slow to add rows using a loop. It's much faster if a new frame is created using pd.concat. So, instead of a loop implementation with iterrows(), it's better to concatenate.

def add_data(self, df):
    self.data = pd.concat([self.data, df], ignore_index=True)

If there has to be some logic (e.g. values have to be unique, etc.), then it's better to use boolean indexing to filter df first before concatenation or call drop_duplicates() after concatenation. In other words, there are other methods available to process the data.

That said, for a single row, loc works fine. To append the first row of df to self.data, the following works. Note that it works only if the column names overlap; otherwise a bunch of NaNs will show up.

self.data.iloc[len(self.data)] = df.iloc[0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.