In /proc//stat/proc/<pid>/stat I add user cpu, system cpu, child user and child system cpu together for all processes. I take a delta from a previous sample.
Immediately after I sum the user, nice and system CPU from /proc/stat/proc/stat which should be for the entire box. Again I take a delta.
The sum from the processes is almost always slightly greater than that for the overall processor and I can't figure out why.