I'm trying to do depth estimation with CNNs (this is my ultimate goal), but a problem that i found is: I just did image classifications with CNNs, using for example "CIFAR-10", "MNIST", "Cats vs Dogs", etc. To do depth estimation I need to output a new image (the NYUv2 dataset has the labeled images). So, I'll input an image like 256x256x3 and need to output another image with for example 228x228x3.
What I need to do? Can I just do the convolutions for a while and after that decrease the features maps and increase the dimension? Thanks
obs: I'm using Tensorflow 2.0