1

I am currently looking for a solution to an issue relating to managing and maximizing resources I am accessing from a national HPC service.

The service has 2 main queues of relevance: 1) Intel Xeon Gold 6148 Skylake processors (2x20 cores per node), and to a lesser extent 2) Intel Xeon Phi 7210 KNL processors.

When I run my code [which is an ocean physics/biogeochemistry model] on 1 node, I experience no issues. However, when I run on more than 1 node, I get a segmentation error. The only way to avoid this segmentation error in my experience has been to request fewer than the full compliment of cores per node. This practice is however prohibitive. For example, 1 node has 40 cores. The more nodes I request, the fewer cores I can request. The cap I have met is 6 nodes with 15 cores per node, to give 90 cores overall.

This all naturally means that the ambition I had in terms of speed-up are considerably curtailed.

I understand that the launch node has unlimited stack space, but that every additional node uses a lower default stack size. Consequently, I tried the command ulimit -s unlimited, but no luck.

[UPDATE In response to the first comment below from Gilles, I have tried this suggestion previously to no avail]

I suspect that some proprietary setting on the HPC service I'm using is throttling things, as the same model run on similar HPC services in other countries is apparently OK and gets the desired scale-up.

I would appreciate any suggestions in relation to slurm configuration and HPC. Due to time constraints, I can't engage in extensive rewriting of MPI aspects to this code which has been thoroughly developed by multiple research agencies.

I reiterate that this code works fine on other HPCs and slurm/pbs scripting.

8
  • If you are convinced the root cause of the crash is a short stack size on the "other compute nodes", your best bet is to ask your sysadmins how to increase it. meanwhile, try to wrap your binary in a script that ulimit -s unlimited; a.out and then mpirun <that script>. If you use mpirun, you might want to give direct launch (e.g. srun a.out) a try (or the other way around). Commented Apr 8, 2022 at 10:57
  • Thanks for your recommendation, which I tried previously upon suggestion to me. Unfortunately it didn't work then (I may try again to refresh). Do you have any other suggestions outside of the stack size issue? All suggestions are welcome! Commented Apr 8, 2022 at 14:40
  • If your program is hybrid MPI+OpenMP, you might also have to increase the thread stack size. IIRC, use the KMP_STACKSIZE environment variable with Intel compilers. If you use Infiniband, you might also have to ulimit -l unlimited. Commented Apr 8, 2022 at 14:53
  • 1
    What do you mean by "no luck"? are you unable to set the stack size? or can you set it but it does not prevent the crash? Commented Apr 8, 2022 at 14:54
  • My understanding is that the program is either MPI or OpenMP but not both. I have tried each but am focused on MPI. The system I'm using uses Omnipath instead of Infiniband. Commented Apr 8, 2022 at 16:32

1 Answer 1

1

I added the three following flags to the list of ifort compiler flags, which seems to have resolved the issue -heap-arrays 1000 -parallel -xSKYLAKE-AVX512

Now I can avail of all the cores on each node. I suspect that adding the array size specification (1000kb, i.e. 1MB) to the "-heap-arrays" flag is the most important addition. I had previously used it without specifying the size of the array to store on the heap, but it had made no difference.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.