1
$\begingroup$

I have a piece of Tensorflow code:

class Classifier(keras.layers.Layer):
    def __init__(self):
        super().__init__()
        self.classifier = keras_nlp.models.BertClassifier.from_preset(
            "bert_base_en_uncased",
            num_classes=1,
        )

    def call(self, inputs):
        out = self.classifier(inputs)
        out = tf.squeeze(out, -1)
        return out

def get_model_2():
    inp = {
            'token_ids': keras.Input((512,), dtype=tf.int32, name=None),
            'padding_mask': keras.Input((512,), dtype=tf.bool, name=None),
            'segment_ids': keras.Input((512, ), dtype=tf.int32, name=None)
    }
    layer = Classifier()
    out = layer(inp)
    model = keras.Model(inp, out)
    return model

model = get_model_2()
model.compile(
    loss=keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=[
        keras.metrics.TruePositives(),
        keras.metrics.FalsePositives(),
        keras.metrics.TrueNegatives(),
        keras.metrics.FalseNegatives(),
        keras.metrics.BinaryAccuracy()
    ],
    optimizer=keras.optimizers.Adam(1e-4),
)

The problem is: when I call model.summary(), then the result would be

Model: "model_9"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_47 (InputLayer)       [(None, 512)]                0         []                            
                                                                                                  
 input_48 (InputLayer)       [(None, 512)]                0         []                            
                                                                                                  
 input_46 (InputLayer)       [(None, 512)]                0         []                            
                                                                                                  
 classifier_5 (Classifier)   (None,)                      0         ['input_47[0][0]',            
                                                                     'input_48[0][0]',            
                                                                     'input_46[0][0]']            
                                                                                                  
==================================================================================================
Total params: 0 (0.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 0 (0.00 Byte)

The trainable parameter is 0 bytes! I believe that the issue arise when I want to train a model inside a layer is not possible, or even when I try training a model inside a model, which is also not working. I wonder how people fine tune this model? Thank you very much!

Edit 1: Still don't know the solution to this yet, but I guess that the bert implementation of keras is faulty, since the version that I switched to tensorflow_hub.KerasLayer works just fine

Edit 2: Thanks to @Keval Pandya, the code is the solution. The problem lies in the import (not inside the context given by the original question):

import keras
# or
from tensorflow import keras

works fine, but

import tf_keras as keras # Error

shall produce the error. My version of package is:

keras                              3.4.1
keras-nlp                          0.15.1
tf_keras                           2.17.0
tensorflow                         2.17.0
tensorflow-hub                     0.16.1
tensorflow-text                    2.17.0
keras-nlp                          0.15.1
tf_keras                           2.17.0
$\endgroup$

1 Answer 1

0
$\begingroup$

The issue you are facing is how Keras layers and models interact when you embed one model inside another. When you use BertClassifier.from_preset(), it returns a pre-built Keras model. When you try to use that model inside a custom layer like Classifier, it may not correctly propagate the trainable parameters unless you explicitly set it up.

=> Your model has Total params: 0, meaning the pre-trained BERT model is not recognized as trainable.

=> This likely happens because the model inside the Classifier layer doesn't register the parameters correctly.

Keras treats your custom Classifier class as a plain layer, and if you don't explicitly mark the inner layers (like BERT) as trainable or don't pass them properly, Keras may not know there are trainable parameters inside.

to solve this issue you have to set the trainable as true you need to modify your code like this one

import keras_nlp
import tensorflow as tf
from tensorflow import keras

class Classifier(keras.layers.Layer):
    def __init__(self):
        super().__init__()
        # Ensure that the BERT model is marked as trainable
        self.classifier = keras_nlp.models.BertClassifier.from_preset(
            "bert_base_en_uncased",
            num_classes=1,
        )
        # Make sure the BERT model's layers are trainable
        self.classifier.trainable = True

    def call(self, inputs):
        out = self.classifier(inputs)
        out = tf.squeeze(out, -1)  # Adjust output shape
        return out

def get_model_2():
    # Define the input dictionary for token_ids, padding_mask, and segment_ids
    inp = {
        'token_ids': keras.Input((512,), dtype=tf.int32, name='token_ids'),
        'padding_mask': keras.Input((512,), dtype=tf.bool, name='padding_mask'),
        'segment_ids': keras.Input((512,), dtype=tf.int32, name='segment_ids')
    }
    
    # Instantiate the custom classifier layer
    layer = Classifier()
    
    # Pass inputs through the classifier
    out = layer(inp)
    
    # Define the final model
    model = keras.Model(inputs=inp, outputs=out)
    
    return model

# Create the model
model = get_model_2()

# Compile the model with an appropriate optimizer, loss, and metrics
model.compile(
    loss=keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=[
        keras.metrics.TruePositives(),
        keras.metrics.FalsePositives(),
        keras.metrics.TrueNegatives(),
        keras.metrics.FalseNegatives(),
        keras.metrics.BinaryAccuracy()
    ],
    optimizer=keras.optimizers.Adam(1e-4),
)

# Check the model summary to see trainable parameters
model.summary()

Here the output of this summary is like this one

enter image description here

Here the tensorflow, Keras, and keras_nlp versions are as below

tensorflow==2.17

keras== 3.4.1

keras_nlp ==0.15.0

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.