Storing Pandas dataframe in working memory

Question

Is there some way to take a dataframe, say,

df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

and store it in temp memory as a binary object that can then be opened with

open(df, 'rb')

So then, rather than do something like

open('/home/user/data.csv', 'rb')

the code would be

df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

df_rb = *command to store in temp working memory as binary readable*

open(df_rb, 'rb')

Look into Python's pickle module: docs.python.org/3.8/library/pickle.html — jfaccioni
– jfaccioni, Commented Jun 2, 2020 at 19:39
ive tried with that, but then i cant extract the filepath with it. so i would need to pickle it, get a filepath (without specifying a filepath), and then use with open. is there some way to do that? — DeathbyGreen
– DeathbyGreen, Commented Jun 2, 2020 at 19:41
Simply dump the DataFrame to a in-memory byte stream (using e.g. BytesIO docs.python.org/3/library/io.html#io.BytesIO) instead of to a file. — jfaccioni
– jfaccioni, Commented Jun 2, 2020 at 19:46
I'm trying to build a work around to a Django REST API issue; i posted an in depth question around that, but i think it was too in depth. This will give a simple work around without having to change up my django api source code — DeathbyGreen
– DeathbyGreen, Commented Jun 2, 2020 at 20:02

wwii · Accepted Answer · 2021-03-10 16:03:48Z

5

You could pickle it to an io.BytesIO object which is in memory

import pandas as pd
import pickle, io
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
f = io.BytesIO()
pickle.dump(df,f)
f.seek(0)    # necessary to start reading at the beginning of the "file"
dg = pickle.load(f)

In [48]: dg==df
Out[48]: 
      a     b
0  True  True
1  True  True
2  True  True

edited Mar 10, 2021 at 16:03

answered Jun 2, 2020 at 20:03

wwii

23.9k7 gold badges42 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nick Brady Over a year ago

this was helpful.. the df.to_pickle will close the io object though, so I preferred to use pickle.dump so I could control when I closed it personally.

wwii Over a year ago

@NickBrady - thanx.

Mayank Porwal · Accepted Answer · 2020-06-02 19:54:40Z

1

Pandas has df.to_pickle() method:

From the docs:

Pickle (serialize) object to file.

df.to_pickle("./dummy.pkl")

Then read this pickled df using read_pickle()

From the docs:

Load pickled pandas object (or any object) from file.

unpickled_df = pd.read_pickle("./dummy.pkl")

edited Jun 2, 2020 at 19:54

answered Jun 2, 2020 at 19:43

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

3 Comments

DeathbyGreen Over a year ago

will the "./dummy.filetype" work on any computer? what i mean is, is there any risk to someone downloading a function i write, running the function (writing a file to './file'), and getting some "directory does not exist" error? it seems like that couldnt be the case

Mayank Porwal Over a year ago

I think it will run on any machine. You can always handle the case where pickle writes in a dir that always exists.

Rubén Colomina Citoler Over a year ago

This does not work if there is no filesystem, e.g. aws lambda

Collectives™ on Stack Overflow

Storing Pandas dataframe in working memory

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related