Skip to main content
Filter by
Sorted by
Tagged with
Advice
0 votes
3 replies
39 views

I am using a HPC system in which the folder /usr/ is not NFS. Therefore, the libraries installed in the master node do not seem available in the computation nodes, that is, if I ssh to a computer node ...
mancolric's user avatar
  • 111
1 vote
1 answer
107 views

I'm building a SLURM pipeline where each stage is a bash wrapper script that generates and submits SLURM jobs. Currently I'm doing complex job ID extraction which feels clunky: # Current approach ...
desert_ranger's user avatar
1 vote
0 answers
45 views

I'm trying to run the Neo4j Docker container using Singularity on an HPC system. The container starts successfully, but it shuts down automatically when I try to add data to the database (e.g., via ...
prasad's user avatar
  • 13
1 vote
1 answer
64 views

I have a mpi4py program, which runs well with mpiexec -np 30 python3 -O myscript.py at 100% CPU usage on each of the 30 CPUs. Now I am launching 8 instances with mpiexec -np 16 python3 -O myscript.py. ...
j13r's user avatar
  • 2,731
1 vote
0 answers
88 views

I’ve been using salloc to allocate compute nodes without issues before. Recently, after switching to another user account (same .bashrc config, only the conda path changed), salloc stopped working. I ...
Calculus007's user avatar
0 votes
0 answers
49 views

I need to migrate my work for geospatial processing (using mainly qgis processing and postgis functions from python scripts) to a HPC cluster. As neither qgis nor postgis are installed on the HPC I ...
Felix_geospatial's user avatar
0 votes
1 answer
176 views

I'm using Spack on Linux Mint to manage scientific libraries, including armadillo. I have installed Armadillo and its dependencies via Spack in an enviroment. Problem: When I run spack load armadillo, ...
jorge isaac rubiano's user avatar
0 votes
0 answers
56 views

I tried to use the sbatch file from this link (Running WindNinja on an HPC Cluster) to run the WindNinja software (WindNinja introduction) installed on HPC. However, it always produce the "...
Kaiyuan Zheng's user avatar
0 votes
0 answers
78 views

When users request 1-2 GPUs via sbatch --gres=gpu:1, Slurm locks the entire 8-GPU node. This fragments our cluster: Multiple small requests spread across nodes (e.g., four 1-GPU jobs occupy four ...
train-server's user avatar
0 votes
1 answer
58 views

I program in fortran with Intel OneAPI compiler ifx and MKL packages. I want to cal. the scalar product between a mass dim sparse matrix and a vector. When the dim of the sparse matrix could be ...
River Chandler's user avatar
0 votes
1 answer
79 views

I love snakemake and have used it locally as well as on HPC with SLURM! However, now we have a particular setup where it is not as easy to use snakemake as we have done before: We need to run some ...
Sebastian Beyer's user avatar
0 votes
0 answers
49 views

I'm learning UCX by creating a basic wrapper for both the client and server. I am using AM communication. When I run my client, I get below error : [1749297901.816001] [prateek:19822:0] ...
Prateek Joshi's user avatar
0 votes
0 answers
89 views

I'm trying to read different subsets of non-contiguous data from a file to different processes. Ie: I have a file with the data: a b c d e f g h i j and two processes who want to read the data from ...
Subject303's user avatar
1 vote
2 answers
95 views

I'm setting up IO for a largescale CFD code using the MPI library and the file IO is starting to eat into computation time as my problems scale. As far as I can find the "done" thing in the ...
Subject303's user avatar
0 votes
0 answers
50 views

I have a single computation node with 32 CPUs. I have defined two different partitions that both use this node. If I for example send two jobs on partition A requesting 20 CPUs and 25 CPUs, the second ...
Daniel's user avatar
  • 1
0 votes
1 answer
70 views

I want to run a pipeline on a cluster where the name of the jobs are of the form : smk-{config["simulation"]}-{rule}-{wildcards}. Can I just do : snakemake --profile slurm --configfile ...
Kiffikiffe's user avatar
1 vote
1 answer
96 views

When running snakemake on a cluster, and if we don't have specific requirements for some rules about number of cores/memory, then what is the difference between : Using the classic way, i.e. calling ...
Kiffikiffe's user avatar
0 votes
1 answer
89 views

I have a 12-core laptop (6 physical cores with hyperthreading) running Slurm for local job scheduling. When I submit job arrays requesting all 12 cores to be used simultaneously, Slurm consistently ...
desert_ranger's user avatar
0 votes
0 answers
87 views

I want to automate resource allocation in an HPC server's, node forwarding and open jupyterlab in the same node. Individually I have to go through the following steps: user@login1>salloc -A ...
Ep1c1aN's user avatar
  • 743
0 votes
0 answers
22 views

I got some MISTAKE when trying to bind the program with IntelMPI. #define _GNU_SOURCE #include <stdio.h> #include <unistd.h> #include <string.h> #include <sched.h> #include <...
user26958921's user avatar
0 votes
0 answers
85 views

I'm trying to make R scripts run on a HPC cluster (with SLURM workload manager), which need a specific package that I installed in a personal directory since I can't install packages in the server-...
legabgob's user avatar
0 votes
1 answer
40 views

I want to call genetic variants with DeepVariant on an HPC for about 1000 cereal lines. I successfully ran DV for one line with the docker image they provide using Apptainer/Singularity, but for the ...
skranz's user avatar
  • 65
6 votes
3 answers
244 views

I want to use the inclusive scan operation in OpenMP to implement an algorithm. What follows is a description of my attempt at doing so, and failing to get more than a tepid speedup. The inclusive ...
smilingbuddha's user avatar
0 votes
0 answers
143 views

I have Ollama version 0.5.13 installed on my university's HPC cluster. Because of lack of sudo access, I have a custom script that runs ollama for me. I am reproducing it below: # Set the custom ...
Ryan Hendricks's user avatar
0 votes
0 answers
46 views

Im creating a complete HPC architecture on AWS using service AWS PCS. In my cloud formation template literally all resource creation is successful but AWS PCS. Cluster: Type: AWS::PCS::Cluster ...
parthraj panchal's user avatar
0 votes
0 answers
75 views

I have a large .h5 file of high resolution images (~300MB each, 200 images per .h5 file) and need to load samples in python. The current setup uses a separate dataset for each sample. data_group....
gekrone's user avatar
  • 179
0 votes
1 answer
155 views

I'm trying to implement parallelization into a flowsolver code for my Phd, I've inherited a subroutine that is sending data between predefined subdomains. The subroutine is sending data throught the ...
Subject303's user avatar
0 votes
0 answers
95 views

Hi I'm trying to compile and run a .f90 code using the intel fortran compiler (ifx) and the intel mpi library on a linux HPC. I'm invoking the compiler through a .sh script with the following lines: ...
Subject303's user avatar
0 votes
0 answers
68 views

I am trying to solve a nonlinear optimization problem in AMPL. It is quite large but not ridiculously so. I solved a similar problem on my home PC (about 1 order of magnitude less in size though). I ...
apg's user avatar
  • 101
0 votes
0 answers
52 views

I have some software (AMPL) installed on my home folder on a Grid Engine based HPC cluster at a university. I'm looking just to source AMPL properly when I run my jobscript in the queue. I need to run ...
apg's user avatar
  • 101
0 votes
0 answers
42 views

I'm brand new to Linux / slurm / HPC so apologies if this seems trivial. I have access to a node, consisting of 4 GPUS, of a HPC. I have a job that when running on a single GPU runs out of memory so ...
Paul's user avatar
  • 41
0 votes
1 answer
54 views

Backstory: We are submitting an HPC job using the microsoft HPC pack 2019 SP3 SDK. HPC Doesn't natively support Active Directory gMSA accounts, so we obtain the gMSA account password via AD. The MSA ...
Jon Barker's user avatar
  • 1,828
0 votes
0 answers
21 views

enter image description here I have conducted experiments running the MLP (Multi-Layer Perceptron) algorithm on a PC cluster with Apache Spark, with configurations ranging from small data to large ...
Syahel Razaba's user avatar
0 votes
0 answers
65 views

Without an IDE, I can log in to an HPC interactive node by first sshing in to the server using: ssh servername Then I request an interactive node using qrsh # Sun Grid Engine # OR qsub -I # Slurm ...
David LeBauer's user avatar
0 votes
1 answer
124 views

Question I am trying to develop a clear mental model for using SLURM to request resources on HPC systems for hybrid MPI/OpenMP jobs. In thinking about it more, I realized there are some gaps in my ...
Jared's user avatar
  • 714
0 votes
0 answers
50 views

I am attempting to implement a method in MPI for a well established particle simulation program that involves image processing. The program runs a loop for millions of iterations that performs a ...
William Betancourt's user avatar
0 votes
1 answer
81 views

Background Let's say I have a complex MPI program with multiple message passing events and computations. The communication pattern is that of bidirectional ring messaging as shown in the figure below. ...
Nitin Malapally's user avatar
1 vote
0 answers
77 views

The following code example simply calls MPI_Barrier in a loop. On a 2 computer cluster of Intel machines, it runs correctly. When run from an Intel machine, with an AMD machine, it completes the first ...
Jeffrey Faust's user avatar
3 votes
1 answer
299 views

I have a Python script that processes approximately 10,000 FITS files one by one. For each file, the script generates an output in the same directory as the input files and creates a single CSV file ...
Falco Peregrinus's user avatar
1 vote
2 answers
145 views

I am recently learning some HPC topics and get to know that modern C/C++ compilers is able to detect places where optimization is entitled and conduct it using corresponding techniques such as SIMD, ...
PkDrew's user avatar
  • 2,301
0 votes
0 answers
45 views

Because I am trying to find the reasons and solve another problem (this one with mpirun saying I have a problem with my current allocation), I tried to find the allocations of my nodes in a multinode ...
KansaiRobot's user avatar
  • 10.6k
1 vote
0 answers
93 views

I am facing issues with getting a free port in the DDP setup block of PyTorch for parallelizing my deep learning training job across multiple GPUs on a Linux HPC cluster. I am trying to submit a deep ...
Shataneek Banerjee's user avatar
1 vote
0 answers
69 views

I'm trying to use Hypre to solve a system of linear equations: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> #include "HYPRE_krylov.h" #...
Huy Hoàng Nguyễn's user avatar
0 votes
0 answers
62 views

I want to use bash to run a batch job on an HPC. The commands to be executed are saved to a text file. Previously, I used the following to run each line of the text file separately as a batch job. ...
Ahmed El-Gabbas's user avatar
1 vote
0 answers
36 views

Half of my jobs I submit to my HPC return the following error message in the out file and ends my Job: /sw/rl8/zen/app/NetLogo/6.4.0-64/netlogo-headless.sh: line 34: 111089 Killed "$JAVA" &...
Bart de Bruin's user avatar
7 votes
3 answers
235 views

I am new to Openmp programming and I have a question regarding task parallelism on recursions Let's consider this demo C code: #include <stdio.h> #include <stdlib.h> #include <sys/time....
hpc_beginner's user avatar
0 votes
1 answer
529 views

Very simple question. I have access to a multi-node machine and I have to do some NCCL tests. In the readme it says If CUDA is not installed in /usr/local/cuda, you may specify CUDA_HOME. Similarly, ...
KansaiRobot's user avatar
  • 10.6k
0 votes
1 answer
65 views

This question is somehow similar with this one, Slurm: Use cores from multiple nodes for R parallelization But it is for python. I have a python program which can use multiple cores on a PC, it does ...
Quantum Monte Carlo's user avatar
1 vote
0 answers
47 views

I am running an MPI application on 32 processes. The stdout of the rank 0 process tgets sent to a separate file for startup error logging, we will call this file STARTUP_ERROR while the stdout of all ...
Defcon97's user avatar
  • 121
1 vote
0 answers
70 views

I have a particle simulation in C which is split over 4 MPI processes and running fast (compared to serial). However, one region of my implementation is N^2 complexity, where I need to compare each ...
Luna Morrow's user avatar

1
2 3 4 5
33