Skip to content

Conversation

@jreback
Copy link
Contributor

@jreback jreback commented Nov 13, 2015

closes #11595

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string API Design labels Nov 13, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 13, 2015
@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

cc @mrocklin

@mrocklin
Copy link
Contributor

Does this descend into categories and the index?

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

wondering if you were going to ask that....it DOES do the index. not the categories, but I can fix this ......

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

Now includes embedded usage for Index & Categorical

In [5]:    df = DataFrame({'A' : ['foo']*1000})

In [6]:    df['B'] = df['A'].astype('category')

In [8]:    df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 2 columns):
A    1000 non-null object
B    1000 non-null category
dtypes: category(1), object(1)
memory usage: 16.6+ KB

In [9]:    df.info(deep=True)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 2 columns):
A    1000 non-null object
B    1000 non-null category
dtypes: category(1), object(1)
memory usage: 55.7 KB

In [11]: df.memory_usage()
Out[11]: 
A    8000
B    1008
dtype: int64

In [12]: df.memory_usage(deep=True)
Out[12]: 
A    48000
B     1048
dtype: int64

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

And providing on Series as well

In [6]: df['A'].memory_usage()
Out[6]: 8000

In [7]: df['A'].memory_usage(index=True)
Out[7]: 16000

In [8]: df['A'].memory_usage(index=True,deep=True)
Out[8]: 56000

@mrocklin
Copy link
Contributor

BTW, I'm glad that memory_usage_of_objects is usable on numpy arrays as well. I may end up using that outside of pandas.

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

right dask.array could certainly introspect here as well

@max-sixty
Copy link
Contributor

I'll wait until this is merged before adding __getsize__.
Is there a reason index is False by default? I'd have thought that would be 'part of the package'.

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

@MaximilianR I don't recall the discussion, but I think we should change the default. Note that this is just for a direct call to memory_usage and not for .info where it is included.

why don't you post an issue and we'll change in 0.18 (as its a small API change).

jreback added a commit that referenced this pull request Nov 13, 2015
PERF/DOC:  Option to .info() and .memory_usage() to provide for deep introspection of memory consumption #11595
@jreback jreback merged commit ddd0372 into pandas-dev:master Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Design Output-Formatting __repr__ of pandas objects, to_string

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optionally use sys.getsizeof in DataFrame.memory_usage

3 participants