0

I’m applying QAT to YOLOv8n model with the following configuration:

QConfig(
    activation=FakeQuantize.with_args(
        observer=MovingAverageMinMaxObserver,
        quant_min=0,
        quant_max=255,
        dtype=torch.quint8,
        qscheme=torch.per_tensor_affine,
        averaging_constant=0.005,
        reduce_range=False
    ),
    weight=FakeQuantize.with_args(
        observer=PerChannelMinMaxObserver,
        quant_min=-127,
        quant_max=127,
        dtype=torch.qint8,
        qscheme=torch.per_channel_symmetric,
        ch_axis=0
    )
)

With backend set to qnnpack:

torch.backends.quantized.engine = "qnnpack"

Target device only supports ONNX format, so I have to convert the model (after QAT) to ONNX.

To do so, I am using the following procedure:

    def save(self, quantized_onnx_path: str):
        import torch.ao.quantization as quant

        self.quantized_model.eval()
        torch.backends.quantized.engine = 'qnnpack'
        self.quantized_model.apply(.apply(torch.ao.quantization.disable_observer))
        model_to_export = quant.convert(self.quantized_model.cpu(), inplace=False)

        dummy_input = torch.randn(1, 3, 25, 256).cpu()
        
        torch.onnx.export(
            model_to_export,
            dummy_input,
            quantized_onnx_path,
            opset_version=13, 
            input_names=['images'],
            output_names=['output'],
            dynamic_axes = {
                'images' : {0 : 'batch_size'},
                'output' : {0 : 'batch_size'}
            }
        )

But I keep getting this error during export:

NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend.

This could be because the operator doesn't exist for this backend... On the official Torch documentation page, I’ve seen that it might occour because input is not quantized, but even by wrapping the model inside the suggested class:

class QuantWrapper(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.quant = torch.ao.quantization.QuantStub()
        self.model = model
        self.dequant = torch.ao.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)      
        x = self.model(x)       
        x = self.dequant(x) 
        return x

model_to_export = QuantWrapper(model_to_export)

the error remains the same.

How can I solve?

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.