I'm trying to recreate the cv2.warpAffine() function, taking a tensor input and output rather than a Numpy array. However, gradients calculated from the output tensor produce a Non-None gradient tensor of all 0s. I've spent all night trying to see where I'm going wrong. If someone can spot my mistake, I will be so happy to hear what it is. I'm happy to provide any further information.
def tensor_warp_affine(tensor, M, output_shape, border_value=0.0):
height, width = output_shape
channels = tensor.shape[2]
device = tensor.device
dtype = tensor.dtype
# Convert M to tensor and compute inverse
M_tensor = torch.tensor(M, dtype=torch.float32, device=device)
M_padded = torch.cat([M_tensor, torch.tensor([[0, 0, 1]], dtype=torch.float32, device=device)], dim=0)
M_inv = torch.inverse(M_padded)[:2]
# Create coordinate grids
y_coords, x_coords = torch.meshgrid(torch.arange(height, device=device), torch.arange(width, device=device), indexing='ij')
coords = torch.stack([x_coords.flatten(), y_coords.flatten(), torch.ones_like(x_coords.flatten())], dim=0).float()
# Transform coordinates
src_coords = M_inv @ coords
src_x, src_y = src_coords[0], src_coords[1]
# Compute interpolation weights
x0 = src_x.long()
y0 = src_y.long()
x1 = x0 + 1
y1 = y0 + 1
x0 = torch.clamp(x0, 0, tensor.shape[1] - 1)
x1 = torch.clamp(x1, 0, tensor.shape[1] - 1)
y0 = torch.clamp(y0, 0, tensor.shape[0] - 1)
y1 = torch.clamp(y1, 0, tensor.shape[0] - 1)
dx = src_x - x0.float()
dy = src_y - y0.float()
# Perform bilinear interpolation
wa = (1 - dx) * (1 - dy)
wb = dx * (1 - dy)
wc = (1 - dx) * dy
wd = dx * dy
img_flat = tensor.reshape(-1, channels)
pixel = (wa[:, None] * img_flat[y0 * tensor.shape[1] + x0] +
wb[:, None] * img_flat[y0 * tensor.shape[1] + x1] +
wc[:, None] * img_flat[y1 * tensor.shape[1] + x0] +
wd[:, None] * img_flat[y1 * tensor.shape[1] + x1])
result = pixel.reshape(height, width, channels)
return result
I'm doing this so I can pass the resulting tensor into an encoder model that generates latent vectors. I'm trying to optimise added perturbations to an input image using Stochastic gradient descent. My suspicion is that some of the transformation points are out of bounds, and being rounded to zero, is this non-differentiable?
I've tried printing out out-of-bounds values, but nothing ever flags up. I've tried using pytorche affine_grid but the results seem to be incorrect in my implementation. input tensor:
tensor([[[0.0000, 0.0078, 0.0078],
[0.0039, 0.0118, 0.0118],
[0.0000, 0.0078, 0.0078],
...,
[0.0000, 0.0157, 0.0196],
[0.0000, 0.0157, 0.0196],
[0.0000, 0.0157, 0.0196]],
[[0.0000, 0.0078, 0.0078],
[0.0039, 0.0118, 0.0118],
[0.0000, 0.0078, 0.0078],
...,
[0.0000, 0.0157, 0.0196],
[0.0000, 0.0157, 0.0196],
[0.0000, 0.0157, 0.0196]],
[[0.0000, 0.0078, 0.0078],
[0.0000, 0.0078, 0.0078],
[0.0000, 0.0078, 0.0078],
...,
[0.0000, 0.0157, 0.0196],
[0.0000, 0.0157, 0.0196],
[0.0000, 0.0157, 0.0196]],
...,
[[0.0275, 0.1059, 0.4392],
[0.0196, 0.0980, 0.4314],
[0.0157, 0.0941, 0.4275],
...,
[0.3373, 0.5216, 0.8392],
[0.3333, 0.5176, 0.8353],
[0.3059, 0.4902, 0.8078]],
[[0.0275, 0.1059, 0.4392],
[0.0196, 0.0980, 0.4314],
[0.0196, 0.0980, 0.4314],
...,
[0.3373, 0.5216, 0.8392],
[0.3216, 0.5059, 0.8235],
[0.3059, 0.4902, 0.8078]],
[[0.0275, 0.1059, 0.4392],
[0.0196, 0.0980, 0.4314],
[0.0196, 0.0980, 0.4314],
...,
[0.3176, 0.5020, 0.8196],
[0.3216, 0.5059, 0.8235],
[0.3176, 0.5020, 0.8196]]], grad_fn=<SqueezeBackward1>)
The desired output tensor should look something like this:
tensor([[[8.5652e-03, 4.3859e-02, 9.7611e-02],
[7.6292e-03, 4.6845e-02, 8.6061e-02],
[0.0000e+00, 3.5453e-02, 7.4669e-02],
...,
[3.9216e-03, 1.1765e-02, 5.0980e-02],
[2.5561e-03, 1.7278e-02, 3.6886e-02],
[0.0000e+00, 7.8431e-03, 4.7059e-02]],
[[3.9216e-03, 3.9216e-02, 9.4118e-02],
[7.7203e-04, 3.9988e-02, 7.9203e-02],
[0.0000e+00, 3.7218e-02, 6.9977e-02],
...,
[1.6439e-03, 8.9082e-03, 4.8124e-02],
[2.9788e-03, 1.0822e-02, 5.0038e-02],
[3.9216e-03, 1.1765e-02, 5.0980e-02]],
[[0.0000e+00, 3.9216e-02, 6.6035e-02],
[0.0000e+00, 3.1373e-02, 6.6667e-02],
[0.0000e+00, 3.5294e-02, 3.5833e-02],
...,
[0.0000e+00, 3.9216e-03, 4.3137e-02],
[3.9216e-03, 1.0668e-02, 4.9884e-02],
[4.4216e-04, 5.1212e-03, 4.4337e-02]],
...,
[[1.4902e-01, 2.5882e-01, 3.7647e-01],
[1.1612e-01, 2.2642e-01, 3.4406e-01],
[9.3184e-02, 1.7629e-01, 2.6298e-01],
...,
[8.1880e-02, 1.7951e-01, 3.2182e-01],
[1.0127e-01, 1.7469e-01, 3.4608e-01],
[2.0947e-02, 7.9770e-02, 2.1282e-01]],
[[1.4533e-01, 2.6514e-01, 3.8574e-01],
[1.2677e-01, 2.4847e-01, 3.7004e-01],
[9.8382e-02, 1.9250e-01, 2.9153e-01],
...,
[8.0654e-02, 1.5194e-01, 3.3930e-01],
[1.8802e-01, 2.5493e-01, 4.4500e-01],
[1.8050e-02, 6.2331e-02, 2.3506e-01]],
[[1.4967e-01, 2.6551e-01, 3.8581e-01],
[1.1913e-01, 2.3896e-01, 3.6051e-01],
[9.7532e-02, 1.9963e-01, 2.9095e-01],
...,
[5.5842e-02, 1.2251e-01, 3.1291e-01],
[1.6402e-01, 2.2266e-01, 4.1669e-01],
[8.3060e-02, 1.3796e-01, 3.3012e-01]]], grad_fn=<ViewBackward0>)
gradient calculation:
tensor([[[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
...,
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]]])