1

Go is pretty new to me and i have some troubles understanding the memory usage :

I want to load a file similar to csv into an array of rows, each row being a struct composed of a key on 22 char and an array of values (string).
My code look like this : https://play.golang.org/p/hJ4SHjVXaG

Problem is that for a file of 450M it uses around 2G1 of memory.
Does anyone have a solution to reduce that memory use ?

Update using SirDarius solution : https://play.golang.org/p/DBmOFOkZdx still use around 1G9

3
  • What problem do you what to solve with your program? Memory reducing techniques may be very different and depend on problem class. Commented Apr 26, 2016 at 9:45
  • Read file line by line. Recipe is here: stackoverflow.com/a/8758113/1975086 Commented Apr 26, 2016 at 9:57
  • 1
    Do you need everything in memory at once? Can't you just process the file line-by-line as @AlexanderTrakhimenok suggested? Commented Apr 26, 2016 at 10:38

4 Answers 4

7

How many lines and fields are there in the file?

It is plausible that what you are describing is using the minimum amount of memory.

Looking at the code I think it will use 450MB of memory for the underlying string data.

It will then slice that up into strings. These consist of a pointer and a length which take 16 bytes on a 64 bit platform.

So 1.5GB/16 = 93Million.

So if there are >50 Million fields in your file then the memory use seems reasonable.

There are other overheads like number of rows etc so this isn't an exact calculation.

EDIT

Given
5 millions row, 10 column each

That is 50 million string headers of 16 bytes which will take 800MB. Plus the data itself 450MB, plus 5 * 8 * 5 million Rows = 200MB makes 1.45GB

So I don't think even with perfect memory allocation, you'll be able to reduce the usage below 1.5GB.

Sign up to request clarification or add additional context in comments.

1 Comment

5 millions row, 10 column each
2

This seems pretty inefficient to me:

for _, value := range strings.Split(line[23:], ";") {
    row.Values = append(row.Values, value)
}

You basically obtain a []string by calling the string.Split function, and then loop over that slice to append every string to another initially nil string slice.

Why not just do:

row.Values = strings.Split(line[23:], ";")

instead ?

Though I can't guarantee it, it might be possible that the loop causes each string to be copied, and therefore make your program use twice as memory as needed.

1 Comment

Indeed, it's pretty useless, i had some validity check in that loop but i can postpone them. I just tried, it get down to 1g9, not perfect but already better ! Thanks !
1

You are appending into a Row struct the values obtained by each iteration, which considering the huge file size is not a reasonable good approach. Why your are not processing the file in batches?

Looking at the Split function it returns a slice of substrings, so it's not necessary to range over the resulted slices and append them into the row.Values. You can assign the resulted values directly to row.Values, then append it to the rows slice.

func Split(s, sep string) []string

Split slices s into all substrings separated by sep and returns a slice of the substrings between those separators. If sep is empty, Split splits after each UTF-8 sequence. It is equivalent to SplitN with a count of -1.

row.Values = strings.Split(line[23:], ";")
rows = append(rows, row)

Comments

0

Seems to me it's about append() function. From language spec

If the capacity of s is not large enough to fit the additional values, append allocates a new, sufficiently large underlying array

Size of this newly allocated array can be sufficient enough to consume even more further appends. So to allocate precisely you should slice := make([]Row, 0, WithExpectedCapacity) and than assign slice[n]= instead of append(). If you can't do this, you at least can try reflection to compact

reflect.ValueOf(&slice).Elem().SetCap(len(slice))

Some tricky, but you can see https://play.golang.org/p/LslkOBCvII it works.

1 Comment

I know the size of the file I use for my test,so I tried to set manually the capacity of each array, but with no big result ... So I guess it won't change much since i wasn't using the append but i'll try !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.