1

I have an encoder and a decoder model (monodepth2). I try convert them from Pytorch to Keras using Onnx2Keras, but :

  • Encoder(ResNet-18) succeeds
  • I build the decoder myself in Keras (with TF2.3), and copy the weights (numpy array, including weight and bias) for each layer from Pytorch to Keras, without any modification.

But it turns out both Onnx2Keras-converted Encoder and self-built Decoder fails to reproduce the same results. The cross-comparison pictures are below, but I'd first introduce the code of Decoder.

First the core Layer, all the conv2d layer (Conv3x3, ConvBlock) is based on this, but different dims or add an activation:

# Conv3x3 (normal conv2d without BN nor activation)
# There's also a ConvBlock, which is just "Conv3x3 + ELU activation", so I don't list it here.
def TF_Conv3x3(input_channel, filter_num, pad_mode='reflect', activate_type=None):

    # Actually it's 'reflect, but I implement it with tf.pad() outside this
    padding = 'valid'  

    # if TF_ConvBlock, then activate_type=='elu
    conv = tf.keras.layers.Conv2D(filters=filter_num, kernel_size=3, activation=activate_type,
                                  strides=1, padding=padding)
    return conv

Then the structure. Note that the definition is EXACTLY the same as the original code. I think it must be some details about the implementation.

def DepthDecoder_keras(num_ch_enc=np.array([64, 64, 128, 256, 512]), channel_first=False,
                       scales=range(4), num_output_channels=1):
    num_ch_dec = np.array([16, 32, 64, 128, 256])
    convs = OrderedDict()
    for i in range(4, -1, -1):
        # upconv_0
        num_ch_in = num_ch_enc[-1] if i == 4 else num_ch_dec[i + 1]
        num_ch_out = num_ch_dec[i]

        # convs[("upconv", i, 0)] = ConvBlock(num_ch_in, num_ch_out)
        convs[("upconv", i, 0)] = TF_ConvBlock(num_ch_in, num_ch_out, pad_mode='reflect')


        # upconv_1
        num_ch_in = num_ch_dec[i]
        if i > 0:
            num_ch_in += num_ch_enc[i - 1]
        num_ch_out = num_ch_dec[i]
        convs[("upconv", i, 1)] = TF_ConvBlock(num_ch_in, num_ch_out, pad_mode='reflect')  # Just Conv3x3 with ELU-activation

    for s in scales:
        convs[("dispconv", s)] = TF_Conv3x3(num_ch_dec[s], num_output_channels, pad_mode='reflect')

    """
    Input_layer dims: (64, 96, 320), (64, 48, 160),  (128, 24, 80), (256, 12, 40), (512, 6, 20)
    """
    x0 = tf.keras.layers.Input(shape=(96, 320, 64))
    # then define the the rest input layers
    input_features = [x0, x1, x2, x3, x4]

    """
    # connect layers
    """
    outputs = []
    ch = 1 if channel_first else 3
    x = input_features[-1]
    for i in range(4, -1, -1):
        x = tf.pad(x, paddings=[[0, 0], [1, 1], [1, 1], [0, 0]], mode='REFLECT')
        x = convs[("upconv", i, 0)](x)
        x = [tf.keras.layers.UpSampling2D()(x)]
        if i > 0:
            x += [input_features[i - 1]]
        x = tf.concat(x, ch)
        x = tf.pad(x, paddings=[[0, 0], [1, 1], [1, 1], [0, 0]], mode='REFLECT')
        x = convs[("upconv", i, 1)](x)
    x = TF_ReflectPad2D_1()(x)
    x = convs[("dispconv", 0)](x)
    disp0 = tf.math.sigmoid(x)

    """
    build keras Model ([input0, ...], [output0, ...])
    """
    # decoder = tf.keras.Model(input_features, outputs)
    decoder = tf.keras.Model(input_features, disp0)

    return decoder

The cross-comparison is as follows... I would really appreciate it if anyone could offer some insights. Thanks!!!

Original results:

Original Encoder + Self-build Decoder:

enter image description here

ONNX-converted Enc + Original Dec (Texture is good, but the contrast is not enough, the car should be very close, i.e. very bright color): enter image description here

ONNX-converted Enc + Self-built Dec: enter image description here

1 Answer 1

5

Solved!

It turns out there's indeed no problem with implementation (at least not significant ones). It's the problem with weights copying.

The original weights has (H, W, 3, 3), but TF-model requires dim of (3, 3, W, H), so I permuted it by [3,2,1,0], overlooking the (3, 3) also have their own sequence.

So it should be weights.permute([2,3,1,0]), and all is well!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.