0

We have tensorflow application in which we feed data via queues in batched of 250. After moving to use VarLenFeature (instead of FixedLenFeature) we started to have memory leak during training where the memory was constantly increasing. We are training our models using GPU machines.

This is the decode code:

@staticmethod
def decode(serialized_example):
    features = tf.parse_example(
        serialized_example,
        # Defaults are not specified since both keys are required.
        features={
            # target_features
            RECS: tf.VarLenFeature(tf.float32),
            CLICK: tf.FixedLenFeature([], tf.float32)
        })
    return features

then we convert the sparse to dense using:

tf.identity(tf.sparse_tensor_to_dense(tensor), name=key)

and then we loop over with batched over tensorflow queues

This is the create queue code:

@staticmethod
def create_queue(tensors, capacity, shuffle=False, min_after_dequeue=None, seed=None,
                 enqueue_many=False, shapes=None, shared_name=None, name=None):
    tensor_list = _as_tensor_list(tensors)
    with ops.name_scope(name, "shuffle_batch_queue", list(tensor_list)):
        tensor_list = _validate(tensor_list)

        tensor_list, sparse_info = _store_sparse_tensors(
            tensor_list, enqueue_many, tf.constant(True))
        map_op = [x.map_op for x in sparse_info]
        types = _dtypes([tensor_list])
        shapes = _shapes([tensor_list], shapes, enqueue_many)

        queue = data_flow_ops.RandomShuffleQueue(
            capacity=capacity, min_after_dequeue=min_after_dequeue, seed=seed,
            dtypes=types, shapes=shapes, shared_name=shared_name)

    return queue, sparse_info, map_op

And the enqueue operation is:

@staticmethod
def enqueue(queue, tensors, num_threads, enqueue_many=False, name=None, map_op = None):
    tensor_list = _as_tensor_list(tensors)
    with ops.name_scope(name, "shuffle_batch_equeue", list(tensor_list)):
        tensor_list = _validate(tensor_list)
        tensor_list, sparse_info = _store_sparse_tensors(
            tensor_list, enqueue_many, tf.constant(True), map_op)
        _enqueue(queue, tensor_list, num_threads, enqueue_many, tf.constant(True))
    return queue, sparse_info

1 Answer 1

1

Can you provide a minimal example? e.g., do you continue to have the memory leak if you just call the example parsing over and over again via multiple session.run calls, and not have any queues?

The reason I ask is that the _store_sparse_tensors is hidden to that file for a reason; if you misuse it, you will hit a memory leak. Thus all callers of this function must be very careful to use it correctly. For every sparse tensor stored via _store_sparse_tensors, that same tensor must be restored via _restore_sparse_tensors. If it is not, you will leak memory.

I'm considering a DT_VARIANT storage format to replace this wrapper, but for now I'd recommend against using these functions yourself. Instead, you can probably do what you want using the new tf.contrib.data (soon to be tf.data) libraries!

Sign up to request clarification or add additional context in comments.

8 Comments

Before the use of VarLenFeature there was no memory leak. The only change that i didn't pass map_op in the enqueue method i.e.: tensor_list, sparse_info = _store_sparse_tensors(tensor_list, enqueue_many, tf.constant(True))
Right. Can you tell me if you have a memory leak if you are just parsing the VarLenFeature -- not using _store_sparse_tensors?
Not using _store_sparse_tensors or using it without map_op from the previous call is not working. I think i'll try your suggestion to use tf.contrib.data. Do you know if it supports multithreaded processing? i have multiple hdfs files with tfrecords which i want to feed in parallel
yes, there are arguments to the .map function to enable parallelism. see also the interleave and prefetch methods in the tf nightlies
I tried tf.contrib.data but it doesn't support SpaeseTesors at least the TFRecordDataset the map operation fails with exception "TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor " do you know if it should work?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.