How to correctly monitor a program’s GPU memory bandwidth utilization and SM utilization? (DCGM DRAM_ACTIVE vs in-program bandwidth differs a lot)

Question

I want to quantitatively measure the memory bandwidth utilization and SM utilization of a CUDA program for performance analysis and regression testing.

My approach so far:

Compute the theoretical memory bandwidth:

BW_theoretical = mem_clock(Hz) * bus_width(bit) / 8 * 2
Inside the program, calculate actual bandwidth as (bytes read + bytes written) / elapsed time.
Use NVIDIA’s monitoring tool DCGM externally to observe memory bandwidth and utilization during the same program run, then compare the two results.
I expect the [bandwidth from program / BW_theoretical] should near to the DCGM_FI_PROF_DRAM_ACTIVE form dcgm.

Problem

I am using the DCGM metric DCGM_FI_PROF_DRAM_ACTIVE. But I observe that:

The bandwidth measured inside the program (bytes/time) differs a lot from the value reported by DCGM.

My questions

Does DCGM_FI_PROF_DRAM_ACTIVE really represent memory bandwidth utilization? Or does it only indicate the percentage of cycles the DRAM is active (not equivalent to throughput)?
If I want to obtain bytes/sec throughput that can be compared directly with my in-program measurement, which DCGM metrics should I use instead? Or which tools could I used?

Greg Smith · Accepted Answer · 2025-09-16 19:24:36Z

2

DCGM_FI_PROF_DRAM_ACTIVE is the percentage of cycles the DRAM is active. On GPUs with HBM memory the memory clock does not change dynamically so this should be accurate. On GPUs with GDDR memory the memory clock can dynamically change over time to optimize power and performance.

It is not clear how you are counting bytes/sec in the application and adjusting for memory hierarchy.

Nsight Systems and Nsight Compute can both report on data throughput in order to provide a more detailed baseline. Nsight Compute will report in bytes/sec.

answered Sep 16 at 19:24

Greg Smith

11.7k2 gold badges39 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to correctly monitor a program’s GPU memory bandwidth utilization and SM utilization? (DCGM DRAM_ACTIVE vs in-program bandwidth differs a lot)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related