Posts Tagged ‘memory’
Monitoring .NET application performance
If you need to troubleshoot a poorly performing ASP.NET application here is a couple tips for you.
GC Performance Counters
GC is the lungs of the application. It needs to breathe freely to perform well. The rhythm is as important as oxygen intake. First of all, look at the list of the available performance counters for .NET GC:
http://msdn.microsoft.com/en-us/library/x2tyfybc.aspx
The counters you want to watch for are: # Gen 1 Collections, # Gen 2 Collections, % Time in GC, # Bytes in all Heaps, and Large Object Heap size
The best way to monitor a .NET application memory performance is to run a performance counter with the maximum resolution (1 sec) over 2-5 minutes. It’s a bit like getting an EKG: you want a detailed diagram of a short period, not an approximation over a longer period.
Here is what you watch for:
- Normal % Time in GC is around 10-12%. Over 30% is where it starts getting bad. Over 50% is really bad. That means 50% of the processing time is spent managing memory. By design, .NET memory manager will stop all active threads for garbage collection (GC), and no requests will be processed at the time of collection. All new requests will have to wait in the queue.
- # Gen 2 Collections should be 10 times or so lower than # Gen 1 Collections. Gen1 collections are relatively cheap. Gen2 collections are expensive. If the ratio is much lower than 10, it can cause potential latency delays. A ration of 1-to-1 or 1-to-2 indicates severe issues with memory management.
- Large Object Heap size. LOH is never compacted so it can get fragmented easily. It is also collected as a part of Gen2 collection, which is expensive.
W3SCV_W3WP counters
Another set of counters to look at are called W3SVC_W3WP counters. These performance counters allow you to monitor IIS worker processes for application pools. Worker process instance counters are named PID_APPPOOLNAME. You can get all instances using *_APPPOOLNAME for a specific app pool.
The counters to watch for are Active Threads Count, Total Threads Count, and Maximum Threads Count.
Each request requires a thread to execute. Thread Pool is a mechanism in .NET that manages worker threads. It can dynamically grow the number of threads up to the Maximum Threads Count. If GC is the lungs of your app, Thread Pool is the heart. If it stops beating your app will freeze.
Total Threads Count indicates the currently utilized number of threads. If this number gets close to Maximum Threads Count, it means that the pool has at some point expanded to its limit. It’s a thing to watch for, but it’s not very critical.
Active Threads Count indicates how many threads are currently busy. If this number is close or equal to Maximum Threads Count it means the Thread Pool is reaching its capacity. The lack of available threads in the Thread Pool can cause requests to be delayed or denied (you will likely to see 500 HTTP error with ‘too busy’ in details).
Long running requests can cause Thread Pool starvation. It’s a situation where all (or most) worker threads are busy waiting for a response from a web service or long running database query. The utilization of the server is very low (close to 0%) but no new requests can be processed. All new requests will either get denied or delayed until one of the worker threads gets available for processing. Imagine all lanes in a supermarket waiting for a manager at the same time. Everybody’s busy and no work gets done at the same time.
ASP.NET Counters
The last things I am going to mention are general ASP.NET counters.
Take a look at ASP.NET\Request Execution Time and ASP.NET\Request Wait Time counters. They show execution time and wait time, respectively, for the last request. While these numbers are arbitrary, you can quickly estimate the max throughput of your Thread Pool. For instance, if your execution time is 5 seconds and your Maximum Thread Count is 50 you will not be able to process more than 10 requests a second.
Separate application pools have separate thread pools. Collecting performance counter data during peak hours can help understand any potential performance bottlenecks in .NET applications.
Conclusion
By no means these are the only things you should look at, but it’s a good place to start.
If you want to understand how .NET GC works (and it’s a pretty sophisticated GC), Doug Strewart compiled a great index to Maoni Stephens’s GC blog:
http://blogs.msdn.com/b/dougste/archive/2010/02/18/an-index-to-maoni-s-blog-posts-about-the-gc.aspx
Understanding GC: allocating memory (part II)
Previously: Understanding GC: freeing memory (Part I)
While GC resolved a lot of issues around freeing memory it made the situation a lot worse with allocation. Let’s be honest about it: developers have stopped thinking about allocating memory.
A lot of developers perceive GC allocated memory as, if not free, extremely cheap. If unsure — allocate. If unsure — copy. A deep copy, preferably. As the result, a lot of Java and .NET applications are extremely bloated.
Take a look at Visual Studio 2010 Virtual Memory struggle. Isn’t it fascinating? My Visual Studio 2008 runs projects in 150MB virtual space just fine, with the designer open to boot. The new version sets the acceptable threshold at 1.5GB. Just think about it: the new version requires an order of magnitude more virtual memory than its immediate predecessor!
Of course, GC allocated memory is neither free nor cheap. It can be less or more expensive under different circumstances. Here is a short list of things to keep in mind:
- Do not blow through memory. There is no way around it. The more memory you consume the worse performance is going to be. It increases your memory footprint, takes longer to collect, reduces your CPU cache locality, and so forth. If you don’t need to allocate an object, then don’t.
- Try and avoid write barriers. A write barrier occurs when an “older” object (usually older than gen0) is assigned a reference to a “newer” object from gen0. GC needs to know this fact to support partial collections therefore it needs to store information about such assignments. While not extremely expensive it’s not free either, if you are doing a lot of writes it can become a significant problem.
- Try to keep your data structures simple. A compacting GC moves memory around, so it needs to update pointers to the proper after-collection locations.
- Avoid semi-long living objects. A generational GC promotes objects from gen0 to gen1 and gen2, expecting them to live there for a while. One of the worst things that can happen is when lot of objects making it to gen1 and gen2 and then promptly die there. This triggers a lot of extra work for a GC.
- Take care with your I/O memory. If you allocate a buffer and request a network read into it, the buffer becomes pinned. GC cannot move it since the driver will not see it. Pinning memory regions splits your arenas and makes it harder to allocate and compact memory.
- Allocate large memory pieces carefully. When you allocate a large piece of memory (in .NET it’s over 85k) it goes to a Large Object Heap. LOH follows different laws. For instance, in .NET LOH is not compacted. Therefore, it can become fragmented and hurt your performance and memory footprint.
If you want your application to work fast, you need to have a full understanding of what is going on with your memory. A garbage collector can make it easier for you not to make common mistakes, but you still need to understand its laws and limitations.
Understanding GC: Freeing memory (part I)
Let me start by saying that I am a very casual driver. I have never bothered looking under the hood of my Accord. In fact, the reason I drive a Honda is because I don’t have to. I don’t want to debug my car; I want it to just work. Having said that, if I were to depend on my car, whether I did racing or farming, I would make sure that I understood how the machinery worked: what it could, what it could not do, and what would be the best way to keep it running smoothly. After all, you don’t want you car to fail on you when you really can’t afford it.
For me, Garbage Collection is one of those “under the hood” things. It’s a fuel pump for your applications. It is extremely important to understand how it works.
Memory management does two things: allocates and releases memory. Garbage Collection, a memory management systems, made it to mainstream in late 90s, mostly thanks to Java and .NET. It helped resolve a lot of issues that plagued development at the time:
- Memory leaks. It was a huge problem then and it is still a problem now. Even companies like Google can’t keep their products without leaks. GC has made it a lot harder to introduce memory leaks. While the issue remains, it doesn’t happen nearly as much.
- Having references to “dead” objects. As a consequence of fighting against #1, you can get a reference to a piece of memory that had already been freed. If memory manager allocated something else in the same space you were in a lot of trouble. GC completely resolved this.
- Allocating on top. If you programmed for WIN32 API you might remember ERROR_INSUFFICIENT_BUFFER and all kinds of jumping through hoops to figure out how much memory was needed. GC also completely resolved this issue since you can actually allocate objects as you need in your libraries and then return them to the caller.
- Concurrency. In multi-threaded systems it wasn’t always quite clear if a thread could release an object. One of the most common ways to deal with it was reference counting. Usually it had to be done under a lock, and it scaled poorly with the number of threads. With GC, you can share your objects between threads without worrying about anything.
All these things come at a price. GC is by no means cheap nor simple. In fact, a good GC implementation is an extremely complex task to undertake. Even for such successful projects as MONO it can take years to implement a compacting GC:
“During the last years Paolo and Mark have been developing a copying garbage collector for Mono… The code is still experimental and should not be used in any kind of production environment. It has been checked into Mono’s trunk repository and is available as of revision 61240, it turned on using the –with-gc=sgen, but it is not recommended production use yet (as of July 15th 2009).”
Here is a short list of possible complications:
- GC either needs to halt the system or implement read/write barriers to protect the data while GC is working.
- Generational GC tend to implement write barriers to keep “old” data separated from the “new” data, as its more likely that the “new” data points to the “old” data.
- Allocating large objects tends to trash gen0 too fast. That’s why you see things as Large Object Heap.
- Pinned memory regions. Let’s say you requested an I/O read into a region of memory. This region is now “pinned” and can’t be compacted. This usually causes arenas to break down into smaller pieces.
- Applications that don’t fit generational GC model (most of the young objects die in gen0) incur a much greater collecting costs.
- Collecting becomes more and more expensive as your memory footprint grows. For instance, Cached# starts triggering gen2 collections too often at around 8gb and ends up spending most of the CPU time in GC reaching 14gb (2×4 Xeon X5550 2.66Ghz, 24GB of RAM). Of course, you can start 7x2GB instances of Cached# to avoid this issue, but it’s a GC issue nevertheless.
As you can see, there are quite a few trade offs. For majority of applications the pros significantly outweigh the cons. However, the cons are still there, and developers need to be aware of them. I’ve noticed that more and more people think that GC is a “silver bullet” with no downsides. It is simply not true.

