I’m applying QAT to YOLOv8n model with the following configuration:
QConfig(
activation=FakeQuantize.with_args(
observer=MovingAverageMinMaxObserver,
quant_min=0,
quant_max=255,
dtype=torch.quint8,
qscheme=torch.per_tensor_affine,
averaging_constant=0.005,
reduce_range=False
),
weight=FakeQuantize.with_args(
observer=PerChannelMinMaxObserver,
quant_min=-127,
quant_max=127,
dtype=torch.qint8,
qscheme=torch.per_channel_symmetric,
ch_axis=0
)
)
With backend set to qnnpack:
torch.backends.quantized.engine = "qnnpack"
Target device only supports ONNX format, so I have to convert the model (after QAT) to ONNX.
To do so, I am using the following procedure:
def save(self, quantized_onnx_path: str):
import torch.ao.quantization as quant
self.quantized_model.eval()
torch.backends.quantized.engine = 'qnnpack'
self.quantized_model.apply(.apply(torch.ao.quantization.disable_observer))
model_to_export = quant.convert(self.quantized_model.cpu(), inplace=False)
dummy_input = torch.randn(1, 3, 25, 256).cpu()
torch.onnx.export(
model_to_export,
dummy_input,
quantized_onnx_path,
opset_version=13,
input_names=['images'],
output_names=['output'],
dynamic_axes = {
'images' : {0 : 'batch_size'},
'output' : {0 : 'batch_size'}
}
)
But I keep getting this error during export:
NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend.
This could be because the operator doesn't exist for this backend... On the official Torch documentation page, I’ve seen that it might occour because input is not quantized, but even by wrapping the model inside the suggested class:
class QuantWrapper(torch.nn.Module):
def __init__(self, model):
super().__init__()
self.quant = torch.ao.quantization.QuantStub()
self.model = model
self.dequant = torch.ao.quantization.DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.model(x)
x = self.dequant(x)
return x
model_to_export = QuantWrapper(model_to_export)
the error remains the same.
How can I solve?