If I declare a variable on the GPU (e.g., an array alpha with N elements, cudaMalloc((void**)&alpha, N * sizeof(double))) and allocate its value in a global function without freeing its memory, this variable should be available for other successive global functions, right?
You cannot call cudaMalloc() from a global function (kernel). It's a host function. You can use malloc() and new in kernels, but that can be inefficient.
You can use the same array in multiple kernels, for instance, you might perform multiple calculation steps with different kernels.
Besides, is it possible (or advisable) to compute a scalar variable on the GPU and make it shared among several global functions on the GPU, or is it better to pass it every time as an argument from the CPU?
If you pass a constant as an argument to a kernel, it is very efficiently shared among all the threads. So, it will typically be much more efficient to calculate parameters on the CPU and pass them to the kernel.
If there is a lot of parallel calculation that goes into creating the scalar, then it would be best to calculate it with a separate kernel, pass it back to the host and then pass it to the next kernel as an argument. Alternatives would just increase the complexity of your code without any performance benefit.
If there is little computation required for the scalar, then it doesn't make sense to calculate it with a kernel. Also, remember that there is no guarantee as to which order the blocks are launched in the kernel, so you would have to create a separate code path in the kernel to set up the scalar and then do costly thread index testing and synchronization to calculate the scalar and make it available to all the threads.