2

I am using the LightGBM C Api in our ML model hosting service, written in Golang. I've written a CGO wrapper around the C Api. I am using the “lib_lightgbm.so” library file provided on Github.

I am on go1.20.4 in a Linux environment.

I raised an issue with the official LightGBM git as well here. It contains a more detailed analysis of the situation.

Context:

I load a few LightGBM models in our model hosting service in production and refresh the models as soon as the new ones are available. The new models are loaded via the method LGBM_BoosterCreateFromModelfile provided by the api and the older ones are released with with the method LGBM_BoosterFree. I am hosting this service on GKE pods which have a fixed amount of memory.

Issue:

I see a gradual uptick in the RSS (Resident Memory Set) of the service as soon as the model is refreshed. To debug the issue, I stripped down the problematic piece of code to bare minimum and following is the result.

package main

// #cgo LDFLAGS: -L/home/ayush.goya/minimalExample/ -l_lightgbm
// #include "c_api.h"
// #include <stdio.h>
// #include <stdlib.h>
import "C"
import (
    "runtime/debug"
)

var predictor C.BoosterHandle

func Load() {
    outNumIterations := C.int(0)
    res := int(C.LGBM_BoosterCreateFromModelfile(C.CString("model.txt"), &outNumIterations, &predictor))
    debug.FreeOSMemory()
    println("Load Success")
}

func Release() {
    res := int(C.LGBM_BoosterFree(predictor))
    debug.FreeOSMemory()
    println("Release Success")
}

I am measuring RSS through this piece of code and I trust the values because they match with htop

func GetRssMB() string {
    // Read memory statistics from /proc/self/statm
    data, err := os.ReadFile("/proc/self/statm")
    if err != nil {
        fmt.Println("Error reading /proc/self/statm:", err)
        return "0"
    }

    // Extract resident memory size (in pages)
    fields := strings.Fields(string(data))
    if len(fields) < 2 {
        fmt.Println("Unexpected format of /proc/self/statm")
        return "0"
    }
    rssPages, err := strconv.ParseUint(fields[1], 10, 64)
    if err != nil {
        fmt.Println("Error parsing resident memory size:", err)
        return "0"
    }

    // Convert pages to bytes (assuming 4 KB page size)
    rssBytes := rssPages * 4096
    rssBytes /= 1024 * 1024

    return strconv.FormatUint(rssBytes, 10)
}

Let's say the starting RSS is x bytes. After calling Load(), when the model has finished loading, the RSS increases to y, which is expected. Now, on trying to free up the memory by calling Release(), the RSS is z.

Me expectation is that, z should be very close to x. Instead z and x have a huge difference in values. (z is almost 100 times x for my model size).

This happens everytime I go through the cycle of Load() and Release() and hence the RSS gradually increases. This is causing my GKE pods to get OOM killed.

What is holding up the memory and not returning? I profiled the code and the heapSys, heapIdle, heapInuse are all very low.

I am at a loss on how to figure this. Is it something about Go memory management that I am missing here? Or something about how to handle CGo. Requesting help.

5
  • FYI debug.FreeOSMemory has nothing to do with memory allocated in C via CGO, since that memory was not allocated by the Go runtime. You also can't profile memory allocated in C via the go profiler, again because that is entirely outside of the go runtime. Commented Apr 17, 2024 at 19:33
  • Yes, agreed. I am running this minimal example via a fastHttpServer to Load() and Release() on demand, by hitting an API call. So I added debug.FreeOSMemory to be sure not to pick up any footprints related to serving. Commented Apr 17, 2024 at 19:37
  • The only thing you could fix in Go would be for example if you were forgetting to call an api to release resources, or even just C.free. That all depends on how the C library is designed of course. If the C code is leaking memory, then you need to fix it from within the C code, it's not something related to Go. Commented Apr 17, 2024 at 19:40
  • 1
    You definitely need to C.free() the value returned by C.CString("model"), though I doubt that's the issue causing you to go OOM. Commented Apr 17, 2024 at 22:08
  • @Zyl that is just the file name and not the actual model string. Yes, the name string will need to be freed but it's insignificant in the broader context Commented Apr 18, 2024 at 5:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.