The OpenCL queue works sequentially, so one kernel is executed after the other. This makes sure that - if kernel 2 reads memory that kernel 1 hsahas updated, there is no race condition like if they would run concurrently. There may also be some latency to start execution of a kernel.
To run multiple kernels in parallel, you can try multiple queues.