0

I have a model that has a GRU implementation inside and process audio samples. In each forward path I process a single sample of an audio file. To imitate the GRU behavior correctly, I have returned the output of my GRU implementation in each forward pass and pass that data alongside other inputs to my model in the next forward calculation as the initiation value of GRU implementation. So, I don't use that output in my Loss function.

To calculate gradients I am using tf.GradientTape().gradient but it returns all None for all variables. Can the fact that one of my outputs hasn't been used in loss calculation be the source of these None gradients?

Following is a schematic of my training loop:

for epoch in epochs: 
        for batch in dataset:
            with tf.GradientTape() as tape:
                for audio in batch:
                    for sample in audio:
                        primary_output,gru_next  = my_model([other_inputs, previous_gru_output], training=True)
                        stacked_primary_outputs[sample] = primary_output
                        previous_gru_output = gru_next  
                    enhanced_audio = create_the_output_audio_by_accumulating_primary_output(stacked_primary_outputs)
                    single_audio_loss = my_loss_function(clean_audio, enhanced_audio)
                    total_loss += single_audio_loss 
                grads = tape.gradient(total_loss, my_model.trainable_weights)
                optimizer.apply_gradients(zip(grads, my_model.trainable_weights))
2
  • 2
    This cannot be answered without knowing the loss function as well as the "accumulation" function you are using, which are likely not differentiable. In any case, the fact that the function returning enhanced_audio does not even take primary_output as an argument is a bad sign. Commented Feb 24 at 18:05
  • @xdurch0 Thank you for your comment. I edited it as it uses stacked_primary_outputs in enhanced_audio creation and uses the enhanced_audio in loss calculation. Commented Feb 25 at 8:33

1 Answer 1

0

The None gradients occur because assigning Tensors to Python lists breaks the differentiable graph, you must use tf.TensorArray to maintain the connection. Also, simply summing the loss leads to unstable gradients, so you should average the total loss by the batch size to normalize the updates.

with tf.GradientTape() as tape:
    total_loss = 0.0
    for i, audio in enumerate(batch):
        # Use TensorArray to maintain the gradient graph
        outputs = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
        previous_gru = tf.zeros([1, gru_units]) 
        for j, sample in enumerate(audio):
            primary, next_gru = my_model([tf.reshape(sample, [1, -1]), previous_gru], training=True)
            outputs = outputs.write(j, primary)
            previous_gru = next_gru
        
        # Accumulate differentiable output and calculate loss
        single_loss = my_loss_function(clean_audio[i], outputs.stack())
        total_loss += single_loss

    # Average the loss to prevent exploding gradients
    avg_loss = total_loss / tf.cast(len(batch), tf.float32)

grads = tape.gradient(avg_loss, my_model.trainable_weights)
optimizer.apply_gradients(zip(grads, my_model.trainable_weights))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.