This circuit should be reasonably efficient in size and depth, but with priority on depth.
If depth was not a concern, then I guess I could make a specialized adder for the least significant bit and then modelled the rest of the circuit as a ripple adder with only that initial, possible carry. The carry has to ripple through the first sequence of consecutive bits set to 1. For example:
$inc(0111) = 1000$
$inc(0001) = 0010$
$inc(0100) = 0101$ (no ripple)
But this takes linear time, in the worst case (binary string is $1...1$). How do you optimize for depth? Does the optimal circuit have a $log_2(n)$ depth?
Is perhaps the best strategy to use a parallel prefix circuit? If so, I guess one would make a specialized adder for the least significant bit. Then you have the result of the least significant bit, which is 0 if a carry was generated, 1 otherwise. If a carry was generated, then you need to efficiently ripple it through all the adjacent, consecutive 1-bits. If one is to use a prefix sum, then you need an associative binary operator. It also needs to preserve the value of the bits that are not part of the initial, consecutive 1-bits (from right to left). This might mean that you pass in the carry bit that was (possibly) generated after the increment on the least significant bit, while the rest of the operators gets fed predefined bits which preserve the values of the relevant bits of the number (bit vector).
At this point, I'm stuck.