4

I currently porting an algorithm to two GPUs. The hardware has the following setup:

  • Two CPUs as a NUMA System, so the main memory is splitted to both NUMA nodes.
  • Each GPU is physically connected to one of the GPUs. (Each PCIe controller has one GPU)

I created two threads on the host to control the GPUs. The threads are bound each to a NUMA-Node, i.e. each of both threads runs on one CPU socket. How can I determine the number of the GPU such that I can select the directly connected GPU using cudaSetDevice()?

1
  • 3
    This is called setting CPU/GPU affinity. It's not trivial as far as I know to do this in a programmatic fashion. Certainly you can do a manual mapping of your system and use that in a hard-coded way. But to do it automatically, the approaches I'm familiar with involve using the PCI bus ID of each GPU, and then traversing the system PCI device tree to discover which PCIE root complex is in the same tree. Are you running linux or windows? Here is one implementation in linux. Commented Apr 17, 2013 at 14:25

2 Answers 2

7

The nvidia-smi tool can tell the topology on NUMA machine.

% nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    CPU Affinity
GPU0     X      PHB     SOC     SOC     0-5
GPU1    PHB      X      SOC     SOC     0-5
GPU2    SOC     SOC      X      PHB     6-11
GPU3    SOC     SOC     PHB      X      6-11

Legend:

  X   = Self
  SOC  = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks
Sign up to request clarification or add additional context in comments.

Comments

6

As I mentioned in the comments, this is a type of CPU GPU affinity. Here is a bash script that I hacked together. I believe it will give useful results on RHEL/CentOS 6.x OS. It probably won't work properly on many older or other linux distros. You can run the script like this:

./gpuaffinity > out.txt

You can then read out.txt in your program to determine which logical CPU cores correspond to which GPUs. For example, on a NUMA Sandy Bridge system with two 6-core processors and 4 GPUs, sample output might look like this:

0     03f
1     03f
2     fc0
3     fc0

This system has 4 GPUs, numbered from 0 to 3. Each GPU number is followed by a "core mask". The core mask corresponds to the cores which are "close" to that particular GPU, expressed as a binary mask. So for GPUs 0 and 1, the first 6 logical cores in the system (03f binary mask) are closest. For GPUs 2 and 3, the second 6 logical cores in the system (fc0 binary mask) are closest.

You can either read the file in your program, or else you can use the logic illustrated in the script to perform the same functions in your program.

You can also invoke the script like this:

./gpuaffinity -v

which will give slightly more verbose output.

Here is the bash script:

#!/bin/bash
#this script will output a listing of each GPU and it's CPU core affinity mask
file="/proc/driver/nvidia/gpus/0/information"
if [ ! -e $file ]; then
  echo "Unable to locate any GPUs!"
else
  gpu_num=0
  file="/proc/driver/nvidia/gpus/$gpu_num/information"
  if [ "-v" == "$1" ]; then echo "GPU:  CPU CORE AFFINITY MASK: PCI:"; fi
  while [ -e $file ]
  do
    line=`grep "Bus Location" $file | { read line; echo $line; }`
    pcibdf=${line:14}
    pcibd=${line:14:7}
    file2="/sys/class/pci_bus/$pcibd/cpuaffinity"
    read line2 < $file2
    if [ "-v" == "$1" ]; then
      echo " $gpu_num     $line2                  $pcibdf"
    else
      echo " $gpu_num     $line2 "
    fi
    gpu_num=`expr $gpu_num + 1`
    file="/proc/driver/nvidia/gpus/$gpu_num/information"
  done
fi

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.