None Gradients for a model with 2 outputs

Question

I have a model that has a GRU implementation inside and process audio samples. In each forward path I process a single sample of an audio file. To imitate the GRU behavior correctly, I have returned the output of my GRU implementation in each forward pass and pass that data alongside other inputs to my model in the next forward calculation as the initiation value of GRU implementation. So, I don't use that output in my Loss function.

To calculate gradients I am using tf.GradientTape().gradient but it returns all None for all variables. Can the fact that one of my outputs hasn't been used in loss calculation be the source of these None gradients?

Following is a schematic of my training loop:

for epoch in epochs: 
        for batch in dataset:
            with tf.GradientTape() as tape:
                for audio in batch:
                    for sample in audio:
                        primary_output,gru_next  = my_model([other_inputs, previous_gru_output], training=True)
                        stacked_primary_outputs[sample] = primary_output
                        previous_gru_output = gru_next  
                    enhanced_audio = create_the_output_audio_by_accumulating_primary_output(stacked_primary_outputs)
                    single_audio_loss = my_loss_function(clean_audio, enhanced_audio)
                    total_loss += single_audio_loss 
                grads = tape.gradient(total_loss, my_model.trainable_weights)
                optimizer.apply_gradients(zip(grads, my_model.trainable_weights))

This cannot be answered without knowing the loss function as well as the "accumulation" function you are using, which are likely not differentiable. In any case, the fact that the function returning enhanced_audio does not even take primary_output as an argument is a bad sign. — xdurch0
– xdurch0, Commented Feb 24 at 18:05
@xdurch0 Thank you for your comment. I edited it as it uses stacked_primary_outputs in enhanced_audio creation and uses the enhanced_audio in loss calculation. — Zahra Kokhazad
– Zahra Kokhazad, Commented Feb 25 at 8:33

Sagar · Accepted Answer · 2025-11-13 08:34:51Z

The None gradients occur because assigning Tensors to Python lists breaks the differentiable graph, you must use tf.TensorArray to maintain the connection. Also, simply summing the loss leads to unstable gradients, so you should average the total loss by the batch size to normalize the updates.

with tf.GradientTape() as tape:
    total_loss = 0.0
    for i, audio in enumerate(batch):
        # Use TensorArray to maintain the gradient graph
        outputs = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
        previous_gru = tf.zeros([1, gru_units]) 
        for j, sample in enumerate(audio):
            primary, next_gru = my_model([tf.reshape(sample, [1, -1]), previous_gru], training=True)
            outputs = outputs.write(j, primary)
            previous_gru = next_gru
        
        # Accumulate differentiable output and calculate loss
        single_loss = my_loss_function(clean_audio[i], outputs.stack())
        total_loss += single_loss

    # Average the loss to prevent exploding gradients
    avg_loss = total_loss / tf.cast(len(batch), tf.float32)

grads = tape.gradient(avg_loss, my_model.trainable_weights)
optimizer.apply_gradients(zip(grads, my_model.trainable_weights))

Collectives™ on Stack Overflow

None Gradients for a model with 2 outputs

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related