There are N threads in a process, and M bytes of memory is allocated for each thread stack. Total memory allocated for stack usage is N x M.
You can reduce total memory consumed by the stack by reducing the number of threads (N), or reducing the memory allocated for each thread (M).
Often a thread won't use all of the stack. It's pre-allocated "in case" it will be needed later, but if the thread doesn't use a deep call path, or doesn't use recursion, it may not need all of the stack space allocated on its behalf.
Finding the optimal stack size can be an art.