Software. Efficiency. Scalability.

Entia non sunt multiplicanda praeter necessitatem

Understanding GC: Freeing memory (part I)

with one comment

Let me start by saying that I am a very casual driver. I have never bothered looking under the hood of my Accord. In fact, the reason I drive a Honda is because I don’t have to. I don’t want to debug my car; I want it to just work. Having said that, if I were to depend on my car, whether I did racing or farming, I would make sure that I understood how the machinery worked: what it could, what it could not do, and what would be the best way to keep it running smoothly. After all, you don’t want you car to fail on you when you really can’t afford it.

For me, Garbage Collection is one of those “under the hood” things. It’s a fuel pump for your applications. It is extremely important to understand how it works.

Memory management does two things: allocates and releases memory. Garbage Collection, a memory management systems, made it to mainstream in late 90s, mostly thanks to Java and .NET. It helped resolve a lot of issues that plagued development at the time:

  1. Memory leaks. It was a huge problem then and it is still a problem now. Even companies like Google can’t keep their products without leaks. GC has made it a lot harder to introduce memory leaks. While the issue remains, it doesn’t happen nearly as much.
  2. Having references to “dead” objects. As a consequence of fighting against #1, you can get a reference to a piece of memory that had already been freed. If memory manager allocated something else in the same space you were in a lot of trouble. GC completely resolved this.
  3. Allocating on top. If you programmed for WIN32 API you might remember ERROR_INSUFFICIENT_BUFFER and all kinds of jumping through hoops to figure out how much memory was needed. GC also completely resolved this issue since you can actually allocate objects as you need in your libraries and then return them to the caller.
  4. Concurrency. In multi-threaded systems it wasn’t always quite clear if a thread could release an object. One of the most common ways to deal with it was reference counting. Usually it had to be done under a lock, and it scaled poorly with the number of threads. With GC, you can share your objects between threads without worrying about anything.

All these things come at a price. GC is by no means cheap nor simple. In fact, a good GC implementation is an extremely complex task to undertake. Even for such successful projects as MONO it can take years to implement a compacting GC:

“During the last years Paolo and Mark have been developing a copying garbage collector for Mono… The code is still experimental and should not be used in any kind of production environment. It has been checked into Mono’s trunk repository and is available as of revision 61240, it turned on using the –with-gc=sgen, but it is not recommended production use yet (as of July 15th 2009).”

Here is a short list of possible complications:

  1. GC either needs to halt the system or implement read/write barriers to protect the data while GC is working.
  2. Generational GC tend to implement write barriers to keep “old” data separated from the “new” data, as its more likely that the “new” data points to the “old” data.
  3. Allocating large objects tends to trash gen0 too fast. That’s why you see things as Large Object Heap.
  4. Pinned memory regions. Let’s say you requested an I/O read into a region of memory. This region is now “pinned” and can’t be compacted. This usually causes arenas to break down into smaller pieces.
  5. Applications that don’t fit generational GC model (most of the young objects die in gen0) incur a much greater collecting costs.
  6. Collecting becomes more and more expensive as your memory footprint grows. For instance, Cached# starts triggering gen2 collections too often at around 8gb and ends up spending most of the CPU time in GC reaching 14gb (2×4 Xeon X5550 2.66Ghz, 24GB of RAM). Of course, you can start 7x2GB instances of Cached#  to avoid this issue, but it’s a GC issue nevertheless.

As you can see, there are quite a few trade offs. For majority of applications the pros significantly outweigh the cons. However, the cons are still there, and developers need to be aware of them. I’ve noticed that more and more people think that GC is a “silver bullet” with no downsides. It is simply not true.

Written by Mikhail Opletayev

December 28, 2009 at 8:11 pm

Posted in development

Tagged with , , , ,

One Response

Subscribe to comments with RSS.

  1. [...] January 11, 2010 Previously:  Understanding GC: freeing memory (Part I) [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.