5

I need to implement a cache in java with a maximum size, would like to do it using the real size of the cache in the memory and not the number of elements in the cache. This cache will basically have String as key and String as value. I have already implemented the cache using the LinkedHashMap structure of java but the question is how to know the actual size of the cache so that i can adapt the policy to drop an object when the size is too big.

Wanted to compute it using the getObjectSize() of the instrumentation package but it seems not working as desired.

When I do getObjectSize( a string ) whatever the size of the string is, it returns the same size : 32. I guess it's just using the reference size of the string or something like that and not the content. So don't know how to solve this problem efficiently.

Do you have any ideas ?

Thanks a lot!

2 Answers 2

4

You might want to consider using Ehcache with memory based cache sizing.

Sign up to request clarification or add additional context in comments.

1 Comment

Thx, that should make the job and probably better than my implemented cache I guess. What type of cache would you suggest? Based on number of entries or size of the cache? Knowing that the application is supposed to be run on a server and can take the all memory available on the machine.
1

If your keys and values are both strings, then the calculation is easy: object overhead + 2 bytes per character in the strings. On a 32-bit Sun JVM, 32 bytes for overhead sounds correct.

There are a couple of caveats: first, the Map that you use to hold the cache adds its own overhead. This will depend on the size of the hash table and the number of entries in the map. Personally, I'd just ignore all overheads and base the calculation on the string lengths.

Second, unless you track strings by identity, you may over-count because the same string may be stored with multiple keys. Since tracking strings by identity would add yet more overhead, this is probably not worth doing.

And finally: while memory-limited caches seem like a good idea, they rarely are. If you know your application well enough, you should know the average string length, and can control the cache based on number of entries. And if you don't know your application that well, a simple LRU expiration policy is likely to get you into trouble: a large entry can cause many small entries to be expired. And if that happens, unless the cost to rebuild is proportional to the size, you've just made your cache less effective.

4 Comments

+1... Additional note: the "character" is actually what fits on a Java char. If for some reason the OP is working with a lot characters only available since Unicode 3.1 and up, then two Java chars are needed (because a single Java char can only hold Unicode 3.0 codepoints) and, hence, 4 bytes per "character" :)
Thx for the answer, that's very interesting, so maybe I will opt for the number of entries. In fact string usually have two values, really small strings of maximum 140 chars, and very big strings that represent a set of values, normally small strings should be more frequent. The cost to rebuild the entry is running a consensus algorithm and the bandwidth, so I would say that it's not proportional to the size but that larger entries are still more difficult to rebuild. Big strings are also less subject to be accessed.
@Syntax - Good point -- and I learned something when I checked the docs before telling you that you were wrong :-) I'd always thought that String.length() returned character count, and String.codePointCount() was needed for turning supplemental chars into non-BMP codepoints. I now know to be more careful.
@Abbadon - in that case, I'd think of using two caches with different expiration policies.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.