is Pandas concat an in-place function?

Question

I guess this question needs some insight into the implementation of concat.

Say, I have 30 files, 1G each, and I can only use up to 32 G memory. I loaded the files into a list of DataFrames, called 'list_of_pieces'. This list_of_pieces should be ~ 30G in size, right?

if I do pd.concat(list_of_pieces), does concat allocate another 30G (or maybe 10G 15G) in the heap and do some operations, or it run the concatation 'in-place' without allocating new memory?

anyone knows this?

Thanks!

I don't think it's inplace... as an aside, I don't think you actually want to read that much into memory (you're not going to leave much room for actually doing calculations)! I think HDF5 store is a much better choice for you. — Andy Hayden
– Andy Hayden, Commented Jun 7, 2013 at 11:51
@AndyHayden, i am afraid i do need that size of data in memory, i need to so some interactive analysis on them :-( — James Bond
– James Bond, Commented Jun 7, 2013 at 12:49

Community · Accepted Answer · 2017-05-23 12:32:53Z

16

The answer is no, this is not an in-place operation; np.concatenate is used under the hood, see here: Concatenate Numpy arrays without copying

A better approach to the problem is to write each of these pieces to an HDFStore table, see here: http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables for docs, and here: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore for some recipies.

Then you can select whatever portions (or even the whole set) as needed (by query or even row number)

Certain types of operations can even be done when the data is on-disk: https://github.com/pydata/pandas/issues/3202?source=cc, and here: http://pytables.github.io/usersguide/libref/expr_class.html#

edited May 23, 2017 at 12:32

CommunityBot

11 silver badge

answered Jun 7, 2013 at 14:37

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jeremy Cochoy · Accepted Answer · 2022-10-12 19:33:14Z

3

Try this:

dfs = [df1, df2]

temp = pd.concat(dfs, copy=False, ignore_index=False)
    
df1.drop(df1.index[0:], inplace=True)

df1[temp.columns] = temp

edited Oct 12, 2022 at 19:33

Jeremy Cochoy

2,6822 gold badges26 silver badges44 bronze badges

answered Sep 29, 2022 at 22:25

Marcos M. A. Rodrigues

312 bronze badges

4 Comments

benicamera Over a year ago

Try adding code formatting for better readability

vitperov Over a year ago

I've tested your solution with 1.2Gb table. It's definitely slower. Such slow, that I had been waiting for 10 minutes, the script still was working. (using just pd.concat it takes 30 seconds)

Taha Over a year ago

I find this a very clever way to handle the problem. Thanks. It wasn't that slow. I used it for around 1Gb data on AWS, it worked almost instantaneously.

WestCoastProjects Over a year ago

This is a resourceful way to save limited [memory] resources.

Collectives™ on Stack Overflow

is Pandas concat an in-place function?

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related