Software. Efficiency. Scalability.

Entia non sunt multiplicanda praeter necessitatem

Posts Tagged ‘performance

The Fallacy of Premature Optimization

with 2 comments

A great read by Randall Hyde:

http://www.acm.org/ubiquity/views/v7i24_fallacy.html.acm.org/ubiquity/views/v7i24_fallacy.html

More by Joe Duffy:

http://www.bluebytesoftware.com/blog/default,date,2010-09-06.aspx

The important part to understand is that performance, just anything else, is a feature of your software product. Your product must perform at a certain level in order to be successful. Ignoring this simple fact leads to fail whales.

Just being aware of performance while designing a  product will help tremendously- just like how one of the better ways to diet is to calculate how many calories you ingest every meal.

Writing efficient code doesn’t take much more of your time, if any. In fact, it can easily be argued that writing efficient code actually saves you a lot of time in the long run. Not only you won’t have to spend all that time down the road trying to re-factor your code, you will likely end up having a generally better designed system, which speeds things up.

The number of clients on the internet has been growing exponentially. Each day hundreds of thousands of internet-available smart phones are activated. Tablets, net books, web services, crawlers, etc. Ignoring this traffic and hoping to stay below the radar is not something you want to do.

Written by Mikhail Opletayev

September 8, 2010 at 4:12 pm

Monitoring .NET application performance

leave a comment »

If you need to troubleshoot a poorly performing ASP.NET application here is a couple tips for you.

GC Performance Counters

GC is the lungs of the application. It needs to breathe freely to perform well. The rhythm is as important as oxygen intake. First of all, look at the list of the available performance counters for .NET GC:

http://msdn.microsoft.com/en-us/library/x2tyfybc.aspx

The counters you want to watch for are: # Gen 1 Collections, # Gen 2 Collections, % Time in GC, # Bytes in all Heaps, and Large Object Heap size

The best way to monitor a .NET application memory performance is to run a performance counter with the maximum resolution (1 sec) over 2-5 minutes. It’s a bit like getting an EKG: you want a detailed diagram of a short period, not an approximation over a longer period.

Here is what you watch for:

  1. Normal % Time in GC is around 10-12%. Over 30% is where it starts getting bad. Over 50% is really bad. That means 50% of the processing time is spent managing memory. By design, .NET memory manager will stop all active threads for garbage collection (GC), and no requests will be processed at the time of collection. All new requests will have to wait in the queue.
  2. # Gen 2 Collections should be 10 times or so lower than # Gen 1 Collections. Gen1 collections are relatively cheap. Gen2 collections are expensive. If the ratio is much lower than 10, it can cause potential latency delays. A ration of 1-to-1 or 1-to-2 indicates severe issues with memory management.
  3. Large Object Heap size. LOH is never compacted so it can get fragmented easily. It is also collected as a part of Gen2 collection, which is expensive.

W3SCV_W3WP counters

Another set of counters to look at are called W3SVC_W3WP counters. These performance counters allow you to monitor IIS worker processes for application pools. Worker process instance counters are named PID_APPPOOLNAME. You can get all instances using *_APPPOOLNAME for a specific app pool.

The counters to watch for are Active Threads Count, Total Threads Count, and Maximum Threads Count.

Each request requires a thread to execute. Thread Pool is a mechanism in .NET that manages worker threads. It can dynamically grow the number of threads up to the Maximum Threads Count. If GC is the lungs of your app, Thread Pool is the heart. If it stops beating your app will freeze.

Total Threads Count indicates the currently utilized number of threads. If this number gets close to Maximum Threads Count, it means that the pool has at some point expanded to its limit. It’s a thing to watch for, but it’s not very critical.

Active Threads Count indicates how many threads are currently busy. If this number is close or equal to Maximum Threads Count it means the Thread Pool is reaching its capacity. The lack of available threads in the Thread Pool can cause requests to be delayed or denied (you will likely to see 500 HTTP error with ‘too busy’ in details).

Long running requests can cause Thread Pool starvation. It’s a situation where all (or most) worker threads are busy waiting for a response from a web service or long running database query. The utilization of the server is very low (close to 0%) but no new requests can be processed. All new requests will either get denied or delayed until one of the worker threads gets available for processing. Imagine all lanes in a supermarket waiting for a manager at the same time. Everybody’s busy and no work gets done at the same time.

ASP.NET Counters

The last things I am going to mention are general ASP.NET counters.

Take a look at ASP.NET\Request Execution Time and ASP.NET\Request Wait Time counters. They show execution time and wait time, respectively, for the last request. While these numbers are arbitrary, you can quickly estimate the max throughput of your Thread Pool. For instance, if your execution time is 5 seconds and your Maximum Thread Count is 50 you will not be able to process more than 10 requests a second.

Separate application pools have separate thread pools. Collecting performance counter data during peak hours can help understand any potential performance bottlenecks in .NET applications.

Conclusion

By no means these are the only things you should look at, but it’s a good place to start.

If you want to understand how .NET GC works (and it’s a pretty sophisticated GC), Doug Strewart compiled a great index to Maoni Stephens’s GC blog:

http://blogs.msdn.com/b/dougste/archive/2010/02/18/an-index-to-maoni-s-blog-posts-about-the-gc.aspx

Written by Mikhail Opletayev

September 2, 2010 at 8:34 pm

Posted in performance

Tagged with , , ,

Diagnosis: SQLPhobia

leave a comment »

Don’t ORM me, bro!

Not a day passes without people coming up with new ways to avoid SQL, to hide it from their sight, to banish it behind a layer of abstraction. ORMs such as ActiveRecord, Hibernate, and ADO.NET EF came to rescue. Anything but SQL! Why? Because we all know that SQL is ugly, complicated, not portable, scales poorly, and can cause Bovine Spongiform Encephalopathy if you write more than 2 statements a day.

ORMs, on the other hand, make it all magical. You don’t need to worry about SQL at all. Just define your maps and data will magically appear in your objects or be stored you-know-where.

Just like with anything else, ignoring problems provides little relief in the long run. Relational databases are delicate creatures. They require understanding of how they work. There are tools to make databases behave correctly. Whether it is hinting indexes, sending batches of data, or using prepared statements, there are interfaces for all of those tasks.

There are many little bells and whistles that help your database function better. All these tools are there  for a reason. You can’t shrug all these features away with an ORM and expect your project to work smoothly and scale properly when you hit volume.

Examples

Let me give you a quick example. Relational databases are generally bad at processing one record at a time and too many records at a time. Here is what I mean.

1) Say we have a table Customers where we keep customers, each customer has a unique Id column. We would like to fetch 2 customers with Id’s 1 and 2.

Now, we can do it in this way:

select * from Customers where Id = 1
...
select * from Customers where Id = 2

or this way:

select * from Customers where Id = 1 or Id = 2

The difference in performance? Executing the latter query is about 99% faster than executing two former queries. If you do not believe me, write a simple test and see how much overhead you incur on executing a statement compared to the time it takes to actually execute it. The actual execution time for simple queries is minuscule compared to the total time of execution.

2) Imagine another situation: You have a table PageVisit where you log every page visit. You have a lot of visitors and the table grows quickly. At some point of time you decide to delete all history that is more than 180 days old.

As is quite usual for databases, there is a simple solution:

delete from PageVisit where VisitDate < SYSDATE - 180

The problem here is very simple. If the PageVisit table is huge, the query will take a while to execute. The database will have to create an in-memory copy of the table first, due to its transactional nature. This is done so that ongoing selects can still see consistent data while the delete operation is executing.

If you are not worried about consistency you can speed up this operation by writing a script that executes a select, returns batches of Id’s for the records that need to be deleted, and then feeds those Id’s to a delete. On a big table a pump will execute much faster than a single query, even though it has to go through a client script that pumps the Id’s.

Conclusion

These are just two quick examples of how understanding SQL and how relational databases work can be important for your project. Even if you use an ORM of some sort, it’s extremely handy to understand what’s going on with your database, to check SQL logs, and to think about how you could optimize the queries. In the long run, it’ll make a big difference for your project.

Written by Mikhail Opletayev

January 20, 2010 at 7:22 pm

Posted in db

Tagged with ,

Understanding GC: allocating memory (part II)

with one comment

Previously:  Understanding GC: freeing memory (Part I)

While GC resolved a lot of issues around freeing memory it made the situation a lot worse with allocation. Let’s be honest about it: developers have stopped thinking about allocating memory.

A lot of developers perceive GC allocated memory as, if not free, extremely cheap. If unsure — allocate. If unsure — copy. A deep copy, preferably. As the result, a lot of Java and .NET applications are extremely bloated.

Take a look at Visual Studio 2010 Virtual Memory struggle. Isn’t it fascinating? My Visual Studio 2008 runs projects in 150MB virtual space just fine, with the designer open to boot. The new version sets the acceptable threshold at 1.5GB. Just think about it: the new version requires an order of magnitude more virtual memory than its immediate predecessor!

Of course, GC allocated memory is neither free nor cheap. It can be less or more expensive under different circumstances. Here is a short list of things to keep in mind:

  1. Do not blow through memory. There is no way around it. The more memory you consume the worse performance is going to be. It increases your memory footprint, takes longer to collect, reduces your CPU cache locality, and so forth. If you don’t need to allocate an object, then don’t.
  2. Try and avoid write barriers. A write barrier occurs when an “older” object (usually older than gen0) is assigned a reference to a “newer” object from gen0. GC needs to know this fact to support partial collections therefore it needs to store information about such assignments. While not extremely expensive it’s not free either, if you are doing a lot of writes it can become a significant problem.
  3. Try to keep your data structures simple. A compacting GC moves memory around, so it needs to update pointers to the proper after-collection locations.
  4. Avoid semi-long living objects. A generational GC promotes objects from gen0 to gen1 and gen2, expecting them to live there for a while. One of the worst things that can happen is when lot of objects making it to gen1 and gen2 and then promptly die there. This triggers a lot of extra work for a GC.
  5. Take care with your I/O memory. If you allocate a buffer and request a network read into it, the buffer becomes pinned. GC cannot move it since the driver will not see it. Pinning memory regions splits your arenas and makes it harder to allocate and compact memory.
  6. Allocate large memory pieces carefully. When you allocate a large piece of memory (in .NET it’s over 85k) it goes to a Large Object Heap. LOH follows different laws. For instance, in .NET LOH is not compacted. Therefore, it can become fragmented and hurt your performance and memory footprint.

If you want your application to work fast, you need to have a full understanding of what is going on with your memory. A garbage collector can make it easier for you not to make common mistakes, but you still need to understand its laws and limitations.

Written by Mikhail Opletayev

January 11, 2010 at 4:33 pm

Posted in development

Tagged with , , , ,

Understanding GC: Freeing memory (part I)

with one comment

Let me start by saying that I am a very casual driver. I have never bothered looking under the hood of my Accord. In fact, the reason I drive a Honda is because I don’t have to. I don’t want to debug my car; I want it to just work. Having said that, if I were to depend on my car, whether I did racing or farming, I would make sure that I understood how the machinery worked: what it could, what it could not do, and what would be the best way to keep it running smoothly. After all, you don’t want you car to fail on you when you really can’t afford it.

For me, Garbage Collection is one of those “under the hood” things. It’s a fuel pump for your applications. It is extremely important to understand how it works.

Memory management does two things: allocates and releases memory. Garbage Collection, a memory management systems, made it to mainstream in late 90s, mostly thanks to Java and .NET. It helped resolve a lot of issues that plagued development at the time:

  1. Memory leaks. It was a huge problem then and it is still a problem now. Even companies like Google can’t keep their products without leaks. GC has made it a lot harder to introduce memory leaks. While the issue remains, it doesn’t happen nearly as much.
  2. Having references to “dead” objects. As a consequence of fighting against #1, you can get a reference to a piece of memory that had already been freed. If memory manager allocated something else in the same space you were in a lot of trouble. GC completely resolved this.
  3. Allocating on top. If you programmed for WIN32 API you might remember ERROR_INSUFFICIENT_BUFFER and all kinds of jumping through hoops to figure out how much memory was needed. GC also completely resolved this issue since you can actually allocate objects as you need in your libraries and then return them to the caller.
  4. Concurrency. In multi-threaded systems it wasn’t always quite clear if a thread could release an object. One of the most common ways to deal with it was reference counting. Usually it had to be done under a lock, and it scaled poorly with the number of threads. With GC, you can share your objects between threads without worrying about anything.

All these things come at a price. GC is by no means cheap nor simple. In fact, a good GC implementation is an extremely complex task to undertake. Even for such successful projects as MONO it can take years to implement a compacting GC:

“During the last years Paolo and Mark have been developing a copying garbage collector for Mono… The code is still experimental and should not be used in any kind of production environment. It has been checked into Mono’s trunk repository and is available as of revision 61240, it turned on using the –with-gc=sgen, but it is not recommended production use yet (as of July 15th 2009).”

Here is a short list of possible complications:

  1. GC either needs to halt the system or implement read/write barriers to protect the data while GC is working.
  2. Generational GC tend to implement write barriers to keep “old” data separated from the “new” data, as its more likely that the “new” data points to the “old” data.
  3. Allocating large objects tends to trash gen0 too fast. That’s why you see things as Large Object Heap.
  4. Pinned memory regions. Let’s say you requested an I/O read into a region of memory. This region is now “pinned” and can’t be compacted. This usually causes arenas to break down into smaller pieces.
  5. Applications that don’t fit generational GC model (most of the young objects die in gen0) incur a much greater collecting costs.
  6. Collecting becomes more and more expensive as your memory footprint grows. For instance, Cached# starts triggering gen2 collections too often at around 8gb and ends up spending most of the CPU time in GC reaching 14gb (2×4 Xeon X5550 2.66Ghz, 24GB of RAM). Of course, you can start 7x2GB instances of Cached#  to avoid this issue, but it’s a GC issue nevertheless.

As you can see, there are quite a few trade offs. For majority of applications the pros significantly outweigh the cons. However, the cons are still there, and developers need to be aware of them. I’ve noticed that more and more people think that GC is a “silver bullet” with no downsides. It is simply not true.

Written by Mikhail Opletayev

December 28, 2009 at 8:11 pm

Posted in development

Tagged with , , , ,

SQLite vs. SqlServerCE

leave a comment »

Yesterday, I switched one of my projects from Microsoft SQL Server CE 3.5 to SQLite. The project crawls through a set of files and stores index information in a simple database. I will later use this information to organize the files and do searches.

I went and downloaded SQLite ADO.NET on a whim and plugged it into my project. The results were pretty stunning.

On the test data set (around 50MB worth of files), the initial indexing went around 30% faster. The size of the index file went down from 12MB to 7MB. All this on top of the fact that there is no run-time installation for SQLite!

Kudos to SQLite team. It’s a very impressive product, a lean and speedy small database.

Written by Mikhail Opletayev

December 16, 2009 at 6:50 pm

Posted in db, software

Tagged with , ,

Follow

Get every new post delivered to your Inbox.