Quantization scheme for Convolutional Neural Network 8-bit quantization in tensorflow

Question

Tensorflow code for quantization From all the papars i have reffered for CNN quantization the quantization scheme is stated as

step size = range/255 for 8-bit here range = xmax-xmin but as shown in the image in the tensorflow implementation

range is given by range = std::max(std::abs(*min_value), std::abs(*max_value));

CAN ANY ONE TELL ME THE DIFFERENCE OR PURPOSE

suharshs · Accepted Answer · 2020-04-01 23:10:51Z

0

This is because the code you are pointing to is for symmetric quantization where the range needs to be the same on both sides of 0. So the "range" variable in that code really refers to half of the entire floating point range.

for instance, min_value = -1 max_value = 2

range = std::max(abs(-1), abs(2)) = 2

So the entire range in that code will be -2 to 2.

Hope that makes sense!

answered Apr 1, 2020 at 23:10

suharshs

1,0888 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Akash Bhogar Over a year ago

Tq and then is that so while we calculate stepsize we consider 1sided range bcz in the same code by considering ur example scaling factor= 2/127

Collectives™ on Stack Overflow

Quantization scheme for Convolutional Neural Network 8-bit quantization in tensorflow

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related