16

I have a memory leak with TensorFlow. I refered to Tensorflow : Memory leak even while closing Session? to address my issue, and I followed the advices of the answer, that seemed to have solved the problem. However it does not work here.

In order to recreate the memory leak, I have created a simple example. First, I use this function (that I got here : How to get current CPU and RAM usage in Python?) to check the memory use of the python process :

def memory():
    import os
    import psutil
    pid = os.getpid()
    py = psutil.Process(pid)
    memoryUse = py.memory_info()[0]/2.**30  # memory use in GB...I think
    print('memory use:', memoryUse)

Then, everytime I call the build_model function, the use of memory increases.

Here is the build_model function that has a memory leak :

def build_model():

    '''Model'''

    tf.reset_default_graph()


    with tf.Graph().as_default(), tf.Session() as sess:
        tf.contrib.keras.backend.set_session(sess)

        labels = tf.placeholder(tf.float32, shape=(None, 1))
        input = tf.placeholder(tf.float32, shape=(None, 1))

        x = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense1')(input)
        x1 = tf.contrib.keras.layers.Dropout(0.5)(x)
        x2 = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense2')(x1)
        y = tf.contrib.keras.layers.Dense(1, activation='sigmoid', name='dense3')(x2)


        loss = tf.reduce_mean(tf.contrib.keras.losses.binary_crossentropy(labels, y))

        train_step = tf.train.AdamOptimizer(0.004).minimize(loss)

        #Initialize all variables
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        sess.close()

    tf.reset_default_graph()

    return 

I would have thought that using the block with tf.Graph().as_default(), tf.Session() as sess: and then closing the session and calling tf.reset_default_graph would clear all the memory used by TensorFlow. Apparently it does not.

The memory leak can be recreated as following :

memory()
build_model()
memory()
build_model()
memory()

The output of this is (for my computer) :

memory use: 0.1794891357421875
memory use: 0.184417724609375
memory use: 0.18923568725585938

Clearly we can see that all the memory used by TensorFlow is not freed afterwards. Why?

I plotted the use of memory over 100 iterations of calling build_model, and this is what I get :

Memory use over 100 iterations

I think that goes to show that there is a memory leak.

12
  • what is the error message you are getting ? Commented Jun 4, 2017 at 10:20
  • There is no error message. The issue is that memory is leaking each time I call the function build_model. Commented Jun 4, 2017 at 10:24
  • In the graph what is the X axis. Is that like you execute this build_model for that many iterations? Commented Jun 4, 2017 at 12:04
  • Yes exactly. It's the number of times build_model was called. Commented Jun 4, 2017 at 12:05
  • So what is happening is it keeps adding up the memory in each iteration. And not releasing right?. Normally TF load all the operations in the graph first and then execute them in a session. Here for each iteration you create a new session right? Commented Jun 4, 2017 at 12:21

5 Answers 5

4

The problem was due to Tensorflow version 0.11. As of today Tensorflow 0.12 is out and the bug is resolved. Upgrade to a newer version and it should work as expected. Don't forget to call tf.contrib.keras.backend.clear_session() at the end.

Sign up to request clarification or add additional context in comments.

2 Comments

The problem persists in newer versions: stackoverflow.com/questions/53687165/…
@SafooraYousefi in the question you linked, the poster does not reset the graph between iterations. As of tensorflow 1.13, this solution still works (tf.reset_default_graph).
2

I had this same problem. Tensorflow (v2.0.0) was consuming ~ 0.3GB every EPOCH in an LSTM model I was training. I discovered that the tensorflow callback hooks were the main culprit. I removed the tensorboard callback & it worked fine after

history = model.fit(
        train_x,
        train_y,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_data=(test_x, test_y)
        ,callbacks=[tensorboard, checkpoint]
)

Comments

0

Normally what happened is we use the loop outside of a session. I think here what is happening is at each time you add more and more memory chunks when running this init_op = tf.global_variables_initializer(). Because if the loop is outside the session it will only get initialized for once. What happen hear is it's always get initialized and keep that in the memory.

Editing the answer since still you have the memory issue

The possibly it's the graph. Because each time you will create a graph which will hold the memory.Try to remove it and run. By removing it will take your all operations as the default graph. I think you need some kind of memory flush function outside the tensorflow. because each time when you run this it will stack up a graph.

14 Comments

Unfortunately tf.global_variables_initializer()is not the source of the problem. You can re-create the same memory leak even if you remove init_op = tf.global_variables_initializer()and sess.run(init_op)
that means even when you running the graph with out a session?
Yes. Let's not keep adding comments. We can talk in chat.
So here you are building graphs at each iteration. Normally we initialize graph before the loop.
Thanks for your help. I also posted an issue on the tensorflow repository on github : github.com/tensorflow/tensorflow/issues/10408 Clearly there is something wrong here. Do you happen to know someone who could fix the problem ?
|
0

I faced something similar in TF 1.12 as well. Don't create the graph and session for every iteration. Every time the graph is created and variable initialized, you are not redefining the old graph but creating new ones leading to memory leaks. I was able to solve this by defining the graph once and then passing the session to my iterative logic.

From How not program Tensorflow

  • Be conscious of when you’re creating ops, and only create the ones you need. Try to keep op creation distinct from op execution.
  • Especially if you’re just working with the default graph and running interactively in a regular REPL or a notebook, you can end up with a lot of abandoned ops in your graph. Every time you re-run a notebook cell that defines any graph ops, you aren’t just redefining ops—you’re creating new ones.

Also, see this great answer for better understanding.

Comments

0

This memory leak issue was resolved in the recent stable version Tensorflow 1.15.0. I ran the code in the question and I see almost a no leak as shown below. There were lots of performance improvements in the recent stable version of TF1.15 and TF2.0.

memory use: 0.4033699035644531
memory use: 0.4062042236328125
memory use: 0.4088172912597656

Please check the colab gist here. Thanks!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.