-1

Every Jupyter notebook I open seems to use a constant 900-1200 MB of RAM continuously. Where is most of this overhead coming from? It seems to be relatively independent of the contents of the notebook, as most of the jump in usage is when the notebook is first opened, before any cells are even executed. In fact, sometimes after executing a number of the cells the RAM usage actually goes down.

To give an idea, in the notebook I have open now, the largest variable I've defined is an ~3000-element list of tuples, each of which contains 5 ints and a float. So it's not like I have several million-row data tables open--and as I said, the RAM usage was already up near that 1-GB mark before I even ran the code that generated this array.

I'm gathering that the Python runtime itself is somehow using most of this RAM, but if I run Python from the command line it doesn't do this. Also, the memory usage is showing up as being used by the browser process, NOT the background Python process (even when the Jupyter notebook is the only thing running in the browser--and when the browser was closed prior to launching the notebook, hence it isn't some kind of left over data from previously browsed websites that never got cleared).

1
  • 2
    Jupyter is complex program - much more complex than Python in command line. And this needs RAM. Commented Sep 17 at 2:23

1 Answer 1

1

If the output of the notebook(s) is large and not cleared that would likely explain this. Potential solutions include: 1) Restarting kernel and removing all output 2) Limiting displays (use .head() on dfs instead of displaying full dfs) 3) Avoid large, complex, and/or interactive plots-save them to files if needed.

For example the below code in a new notebooks returns values of ~70MB

import psutil, os, time
proc = psutil.Process(os.getpid())
for _ in range(10):
    print(proc.memory_info().rss / 1e6, "MB")
    time.sleep(5)

Add the below results in values of over 400 MB NOTE: If memory is a concern on your machine, probably do NOT test this.

import plotly.express as px
import pandas as pd
import numpy as np

# Create a big dataset
N = 2_000_000
df = pd.DataFrame({
    "x": np.random.randn(N),
    "y": np.random.randn(N),
    "color": np.random.choice(["A", "B", "C", "D"], size=N)
})

# Plotly scatter (this will be huge in output JSON)
fig = px.scatter(df, x="x", y="y", color="color", opacity=0.3)
fig.show()

When I save, close, and reopen this notebook, values remain 400MB+

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks pixel-process. That likely explains it--I'd have thought that plots would just be relatively small images (since the data arrays being plotted don't persist, I have to re-generate them every time even though the actual graph is visible).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.