Memory leak with TensorFlow

Question

I have a memory leak with TensorFlow. I refered to Tensorflow : Memory leak even while closing Session? to address my issue, and I followed the advices of the answer, that seemed to have solved the problem. However it does not work here.

In order to recreate the memory leak, I have created a simple example. First, I use this function (that I got here : How to get current CPU and RAM usage in Python?) to check the memory use of the python process :

def memory():
    import os
    import psutil
    pid = os.getpid()
    py = psutil.Process(pid)
    memoryUse = py.memory_info()[0]/2.**30  # memory use in GB...I think
    print('memory use:', memoryUse)

Then, everytime I call the build_model function, the use of memory increases.

Here is the build_model function that has a memory leak :

def build_model():

    '''Model'''

    tf.reset_default_graph()


    with tf.Graph().as_default(), tf.Session() as sess:
        tf.contrib.keras.backend.set_session(sess)

        labels = tf.placeholder(tf.float32, shape=(None, 1))
        input = tf.placeholder(tf.float32, shape=(None, 1))

        x = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense1')(input)
        x1 = tf.contrib.keras.layers.Dropout(0.5)(x)
        x2 = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense2')(x1)
        y = tf.contrib.keras.layers.Dense(1, activation='sigmoid', name='dense3')(x2)


        loss = tf.reduce_mean(tf.contrib.keras.losses.binary_crossentropy(labels, y))

        train_step = tf.train.AdamOptimizer(0.004).minimize(loss)

        #Initialize all variables
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        sess.close()

    tf.reset_default_graph()

    return

I would have thought that using the block with tf.Graph().as_default(), tf.Session() as sess: and then closing the session and calling tf.reset_default_graph would clear all the memory used by TensorFlow. Apparently it does not.

The memory leak can be recreated as following :

memory()
build_model()
memory()
build_model()
memory()

The output of this is (for my computer) :

memory use: 0.1794891357421875
memory use: 0.184417724609375
memory use: 0.18923568725585938

Clearly we can see that all the memory used by TensorFlow is not freed afterwards. Why?

I plotted the use of memory over 100 iterations of calling build_model, and this is what I get :

I think that goes to show that there is a memory leak.

There is no error message. The issue is that memory is leaking each time I call the function build_model. — Syzygy
– Syzygy, Commented Jun 4, 2017 at 10:24
In the graph what is the X axis. Is that like you execute this build_model for that many iterations? — Shamane Siriwardhana
– Shamane Siriwardhana, Commented Jun 4, 2017 at 12:04
Yes exactly. It's the number of times build_model was called. — Syzygy
– Syzygy, Commented Jun 4, 2017 at 12:05
So what is happening is it keeps adding up the memory in each iteration. And not releasing right?. Normally TF load all the operations in the graph first and then execute them in a session. Here for each iteration you create a new session right? — Shamane Siriwardhana
– Shamane Siriwardhana, Commented Jun 4, 2017 at 12:21

rerx · Accepted Answer · 2018-10-16 13:24:34Z

4

The problem was due to Tensorflow version 0.11. As of today Tensorflow 0.12 is out and the bug is resolved. Upgrade to a newer version and it should work as expected. Don't forget to call tf.contrib.keras.backend.clear_session() at the end.

edited Oct 16, 2018 at 13:24

rerx

1,1838 silver badges20 bronze badges

answered Jun 18, 2017 at 19:27

Syzygy

4021 gold badge4 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Safoora Yousefi Over a year ago

The problem persists in newer versions: stackoverflow.com/questions/53687165/…

BiBi Over a year ago

@SafooraYousefi in the question you linked, the poster does not reset the graph between iterations. As of tensorflow 1.13, this solution still works (tf.reset_default_graph).

Ravindhran Sankar · Accepted Answer · 2019-12-08 05:18:56Z

2

I had this same problem. Tensorflow (v2.0.0) was consuming ~ 0.3GB every EPOCH in an LSTM model I was training. I discovered that the tensorflow callback hooks were the main culprit. I removed the tensorboard callback & it worked fine after

history = model.fit(
        train_x,
        train_y,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_data=(test_x, test_y)
        ,callbacks=[tensorboard, checkpoint]
)

edited Dec 8, 2019 at 5:18

answered Nov 26, 2019 at 22:05

Ravindhran Sankar

1039 bronze badges

Comments

Shamane Siriwardhana · Accepted Answer · 2017-06-04 13:30:02Z

0

Normally what happened is we use the loop outside of a session. I think here what is happening is at each time you add more and more memory chunks when running this init_op = tf.global_variables_initializer(). Because if the loop is outside the session it will only get initialized for once. What happen hear is it's always get initialized and keep that in the memory.

Editing the answer since still you have the memory issue

The possibly it's the graph. Because each time you will create a graph which will hold the memory.Try to remove it and run. By removing it will take your all operations as the default graph. I think you need some kind of memory flush function outside the tensorflow. because each time when you run this it will stack up a graph.

edited Jun 4, 2017 at 13:30

answered Jun 4, 2017 at 13:09

Shamane Siriwardhana

4,2317 gold badges38 silver badges75 bronze badges

14 Comments

Syzygy Over a year ago

Unfortunately tf.global_variables_initializer()is not the source of the problem. You can re-create the same memory leak even if you remove init_op = tf.global_variables_initializer()and sess.run(init_op)

Shamane Siriwardhana Over a year ago

that means even when you running the graph with out a session?

Syzygy Over a year ago

Yes. Let's not keep adding comments. We can talk in chat.

Shamane Siriwardhana Over a year ago

So here you are building graphs at each iteration. Normally we initialize graph before the loop.

Syzygy Over a year ago

Thanks for your help. I also posted an issue on the tensorflow repository on github : github.com/tensorflow/tensorflow/issues/10408 Clearly there is something wrong here. Do you happen to know someone who could fix the problem ?

|

ug2409 · Accepted Answer · 2019-06-24 12:28:08Z

I faced something similar in TF 1.12 as well. Don't create the graph and session for every iteration. Every time the graph is created and variable initialized, you are not redefining the old graph but creating new ones leading to memory leaks. I was able to solve this by defining the graph once and then passing the session to my iterative logic.

From How not program Tensorflow

Be conscious of when you’re creating ops, and only create the ones you need. Try to keep op creation distinct from op execution.

Especially if you’re just working with the default graph and running interactively in a regular REPL or a notebook, you can end up with a lot of abandoned ops in your graph. Every time you re-run a notebook cell that defines any graph ops, you aren’t just redefining ops—you’re creating new ones.

Also, see this great answer for better understanding.

Vishnuvardhan Janapati · Accepted Answer · 2019-11-27 00:33:33Z

0

This memory leak issue was resolved in the recent stable version Tensorflow 1.15.0. I ran the code in the question and I see almost a no leak as shown below. There were lots of performance improvements in the recent stable version of TF1.15 and TF2.0.

memory use: 0.4033699035644531
memory use: 0.4062042236328125
memory use: 0.4088172912597656

Please check the colab gist here. Thanks!

answered Nov 27, 2019 at 0:33

Vishnuvardhan Janapati

3,2981 gold badge19 silver badges26 bronze badges

Collectives™ on Stack Overflow

Memory leak with TensorFlow

5 Answers 5

2 Comments

Comments

14 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

14 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related