1

As noted in this question is possible to explicitly release the memory of a dataframe. I am running into an issue which is a bit of an extension to that problem. I often import a whole data set and do a selection on it. The selections tend to come in two forms:

df_row_slice = df.sample(frac=0.6)
df_column_slice = df[columns]

Past some point in my code I know that I will no longer make any reference to the original df. Is there a way to release all the memory which is not referenced by the slices? I realize I could .copy() when I slice but this temporary duplication would cause me to exceed my memory.

UPDATE

Following the reply I think the method would be to drop the columns or rows from the original frame.

df_column_slice = df[columns]
cols_to_drop = [i for i in df.columns if i not in columns]
df = df.drop(columns=cols_to_drop)

or

df_row_slice = df.sample(frac=0.6)
df = df.drop(df_row_slice.index)

Hopefully the garbage collection then works properly to free up the memory. Would it be smart to call

import gc
gc.collect()

just to be safe? Does the order matter? I could drop before the slicing without problem. In my specific case, I make several slices of both types. My hope would be that I could del df and memory management would do something like this under the hood.

1 Answer 1

2

You can use df.drop to remove unused columns and rows.

import os, psutil, numpy as np
def usage():
    process = psutil.Process(os.getpid())
    return process.memory_info()[0] / float(2 ** 20)

df_all = pd.read_csv('../../../Datasets/Trial.csv', index_col=None)
usage()

cols_to_drop = df_all.loc[:5,'Col3':].columns.values
df_all = df_all.drop(columns=cols_to_drop)
usage()

Here first usage() returns 357 and second returns 202 for me.

If you need to have df_row_slice and df_column_slice at the same time, you can do this:

cols_to_drop = df_all.loc[:5,'Col3':].columns.values
rows_to_drop = np.random.choice(df.index.values, int(df.shape[0]*0.4))
df_row_slice = df.drop(rows_to_drop)
df = df.drop(columns=cols_to_drop)
df_column_slice = df

Here df_column_slice is just another view of the same dataframe.

Sign up to request clarification or add additional context in comments.

1 Comment

OK, that will get us part of the way. If I dropped all the columns from df would that remove them from df_column_slice? What if I dropped the whole dataframe?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.