Archive for the ‘software’ Category
On software patents
Lodsys story is generating a lot of buzz on the net.
People argue that the whole software patent system is screwed up. Personally, I tend to agree that copyright is enough in software as, generally speaking, the implementation of the idea is much more complex than the idea. Think app store — it’s one thing to come up with an idea of it and quite another thing to create a viable platform, create libraries, attract developers, go through the approval process, etc.
The other part of the problem is that software patents are supposed to protect a small inventor against bigger players that can copy his or her work. Well, that doesn’t work too well because enforcing a perfectly valid patent against a larger corporation can cost millions of dollars in litigation costs. It’s quite cost prohibitive and there are just not so many cases of this. Also, most of the big players on the market have huge patent portfolios and can always counter-sue either increasing the cost of litigation or making it a wash. There are just not so many of the successful examples of where a small guy got protected by the patent system.
Having said that, I don’t believe for a second the government will have the audacity to abolish the software patent system. Even suspension of granting new software patents while the system is kept intact would be too big of a pull to swallow. However, there are certainly ways the government could deal with the issue.
A big part of the problem is granting crazy patents such as MacroSolve’s one. USPTO is swarmed with patent application and has a multi year backlog. It doesn’t have enough of the qualified personal to evaluate all the patents properly and compare them against the prior art. In this case, the solution could be crowd-sourcing. There are a lot of interested parties out there that would be happy to help USPTO with the research and try to invalidate rogue applications before they become full fledged patents.
There are plenty of concerned organizations and people and this process can be designed to make a financial sense for large companies and concerned groups to voluntarily participate in the process and provide USPTO with their research on prior art and validity of patents.
Another thing the government could do is to prohibit several areas of software from being patentable. User interfaces, file formats, protocols, things like that. Explicitly prohibit patenting different data types: if you communicate a piece of data over the network and store it on the server it doesn’t matter if it’s a geo-location or a credit card charge. Has anyone send information from a phone to the server and stored it there? Then it’s a prior art.
Overall, an open process with some adjustments to what’s patentable and what’s not could result in much better patent system. I have a feeling that guys like Lodsys will eventually generate enough will in the community to break the status quo and push for a reform.
Concurrency in new and existing programming languages
New languages
New languages are popping up almost every other day. If you consider languages being under major development (including core syntax and library changes) there is Go, Rust, Scala, F#, Gosu, just to name a few. In the wake of the immense popularity of dynamic languages such as Ruby, Groovy and Python, these new languages are statically typed. It is quite obvious that there are some issues that current languages don’t address. These issues are pressing enough for people to undertake such an expensive exercise as creating a new language.
To understand things better I look at 5 major categories: memory management, libraries, readability, scalability, and deployment.
- Memory management has arguably brought about the biggest productivity gain in the last 2 decades. Major imperative languages, such as Java and C# gained Garbage Collector (GC) and made developers stop about allocating memory.
- Libraries are important for writing meaningful applications. Good libraries make a world of difference. For instance, WinForms is head and shoulders better than Swing. It has nothing to do with the language per say (even though C# support for events makes a huge difference for GUI libraries), but it certainly reflects on the language. One of the major problem for the new languages is lack of libraries.
- Readability of the code is paramount for a language. If developers have hard time reading code written in the language, they will be much less likely to use it. On top of that, languages that are hard to read make support and maintenance more expensive. If you have ever spent hours trying to figure out what a piece of code does, you understand what I mean.
- Scalability implies that software written in your language should be able to scale on modern hardware. For instance, a language with a single threaded model can’t take advantage of multi-core processors. A language without support for concurrency will make writing scalable apps much harder.
- Deployment is important because it’s when the software actually gets to work. There is always cost associated with deployment. Choosing Ruby will likely require more servers. Choosing C# or F# will require Windows licenses when you deploy. C# will enable you to deploy on Windows Azure , while Ruby will give you an option of Heroku.
Applying these 5 categories to the mainstream languages makes it easier to see why there is a need for new languages.
C
- Unmanaged
- Good basic libraries
- Poor readability
- Fast code, no built-in concurrency support
- Runs everywhere
Java
- Managed (GC)
- Great deal of libraries and frameworks
- Poor readability, verbose
- lack of proper generics, closures, lambda expressions, events, etc.
- VM + JIT, no built-in concurrency support
- Runs everywhere
C#
- Managed (GC)
- Has limited number of libraries and frameworks, Microsoft dominated
- Great readability, has support of most modern features (lambda, closures, generics, etc) as well as LINQ
- VM + JIT, no built-in concurrency support
- Only runs on Windows, license costs might be prohibitive for larger projects. Cloud deployment available with Azure.
Ruby
- Managed (GC)
- Has a good deal of libraries and frameworks
- Great readability, expressive and concise syntax
- VM, slow dynamic code, no built-in concurrency support
- Cheap deployment but known to cause problems in larger deployments (e.g. Twitter)
Python
- Managed (GC)
- Has a great deal of libraries and frameworks
- Good readability
- VM, slow dynamic code, no built-in concurrency support
- Cheap linux deployment but can require more servers for large scale sites
Notice a common theme? Check category number 4 for all listed languages. No built-in concurrency support.
When all these languages were designed the requirements for the concurrent requests execution were much lower than what we have now. There has been a tremendous shift in volume of requests.
Concurrency: CGI
Massive need for concurrency first happened when web applications became available. The first answer to running concurrent scripts was Common Gateway Interface (CGI). It made an HTTP requests look like it was executed by a user, mapped requests parameters to the environment variables, passed input to stdin and sent stdout back to the browser. First implementations would start a process for each request. For instance, if you had a python script it would execute Pyton interpreter for every request. The model relied on OS security and access control and was comfortable at the time as many tools operated with stdin/stdout. However, starting a process for each requests quickly becomes prohibitively expensive.
Concurrency: FastCGI
The answer was found in FastCGI, where a process wasn’t closed at the end of the requests but instead re-used to process next request. FastCGI allowed for a pull of processes to serve a lot of incoming requests, allowing for a better scalability. This model is still widely used for many deployments. However, you quickly find out that switching processes is an expensive task and that inter-process communication (IPC) is also expensive. Another problem is sharing of resources. For instance, each process needs its own database connection pool, instead of having a shared one.
Imagine having 100 front end servers each running 5 FastCGI processes. If no SQL queries are executed in parallel you would need 500 connections to the database. Depends on your database, it can be very expensive to have that many simultaneous connections open.
Concurrency: Thread Pool
The answer to the problem was found in using thread pools. Instead of having multiple processes we run one process per application but we create multiple threads withing the process. Threads are much lighter than processes so context switches are much cheaper, you can run many threads per process (normally anywhere between 50 and 250) , and you can have access to shared memory and resources such as DB connection pool. However, for large scale web applications even this model has its limitations. Your code is expected to be synchronously executed on a single thread that is dedicated to processing your requests. The code blocks your thread while executing. If you have a to access a resource with a slow response time (for instance a web service such as google, facebook, twitter, etc.) your thread has to wait until the sub-requests is executed. If underlying resources are slow, your thread pool will quickly becomes starved: every available thread will be waiting for sub-requests, and no new requests will be processed. This is a very common problem.
Concurrency: Fibers
Many people are coming to realize that thread pools are quite ineffective for applications that have massive load. Many people also realize that you don’t need to be next google to experience these issues; you can have a popular iPhone app or mentioned on Reddit/Digg/Slashdot and find our that your services are bogged down.
The problem with scalability is that you need to be able to survive the burst. It can be extremely cost ineffective to scale hardware to that level. For instance, most of the time your web farm will be doing nothing, but you need to maintain it for that moment when you get mentioned on Twitter and the traffic comes your way.
I mentioned iPhone where a simple free application can become available to millions of users instantly and setting up a web farm with many servers in it is simply not an option. One solution would be to utilize the super powerful servers that you have to the max of their potential. You achieve this by following one simple rule: don’t waste resources on waiting if there is something else to work on.
If you want to achieve maximum throughput you have no choice but to be asynchronous. Instead of waiting for that Google API request to come back you can start processing new incoming requests and get back to serving the first requests once the data is available. It requires a primitive lesser than a thread. You might have heard names such as “fiber”, “tasklet”, “channel”, “coroutine”. Let’s call it “fiber”. Fibers are micro-threads that are executed concurrently. When your fiber encounters a long running operation it schedules it to be executed and yields to another fiber. Essentially, you execute cooperative multitasking within preemptive multitasking.
An ideal application will execute on top of a thread pool with N threads where N = number of CPU cores. You don’t need extra threads, as your operations never block. In reality you might want more threads to smooth things out.
Concurrency: Libraries
This is by no means a novel idea. There are plenty of solutions written to address this specific issue. There is Tornado, Node.js, any many others. I personally think the fact that JavaScript in the browser requires developers to only use non-blocking (another name for asynchronous) calls has advanced the cause greatly. The performance is there, so what’s the problem? Readability and control flow. Consider this code:
function updateStatus() {
try {
var status = twitter.getStatus();
displayStatus( status );
}
catch( error ) {
displayError( error );
}
pretty straight forward, huh? Now, imagine writing this code in a non-blocking way:
function updateStatus() {
twitter.getStatusAsync( onUpdateStatusSuccess, onUpdateStatusError );
}
function onUpdateStatusSuccess( status ) {
displayStatus( status );
}
function onUpdateStatusError( error ) {
displayError( error );
}
Not nearly as hot. Now imagine that displayStatus and displayError are non-blocking calls either. You will quickly end up with a very fragmented code.
Languages that support first class functions or lambdas allow for writing success/error functions inline, however it generally leads to the continuation-passing style which has same problem across the languages: non-blocking calls make natural language constructs such as structured exception handling, loops, condition statements, etc. ineffective.
This is a very important thing to understand. Languages designed as they are with a certain control flow in mind. They have control structures in place to assist the developer in accomplishing certain task. For instance, foreach loop can usually be replaced with for and switch/case/default can be replaced with if/elseif/else construct, but they are there for a reason. They serve to increase productivity through providing more concise and expressive code. Non-blocking libraries render you unable to use a lot of the language constructs and produce poorly structured code.
Concurrency: Language support
It’s quite clear that if you want to achieve the best possible scalability and maintain readability, you not only need a non-blocking library, but also a language that supports this specific style of development.
Let me outline this: to enable wide-spread, practical use of non-blocking paradigm a programming language needs to provide built-in support for it.
This is the theme of all new languages. Whether it is co-routines in Go, actors in Scala, tasks in Rust, or new async/await keywords in C#, all these languages are trying to get a crack at the same problem: make a concurrent development easier and more available to developers.
All new languages (with notable exception of C# 5 approach) implement a form of a message exchange. You can send message from one part of your program to another. Unfortunately, I don’t think this address the issue properly and, while a step forward, is far from the desired solution.
C# is taking a different approach that will reduce code fragmentation a lot better. However it will be a limited success due to the need to explicitly define functions as async and use of Task<T> and thread pool.
I think there is a better solution to the concurrency issue, one that could be applied to the existing languages by introducing libraries and compiler magic around those libraries. However, this is a topic for a different conversation.
Windows 7 HDD Mirror Issue, SyncQueue TaskGroup
I run Windows 7 on my workstation with SSD + RAID1 setup. The SSD contains Windows, Visual Studio, and some other tools that I use. The mirrored HDDs contain all data files and a backup copy of the SSD. This setup has been working fairly well for me because it gives an outstanding performance of the SSD drive (Windows 7 boots up in under 8 sec) and a stability of RAID1 which prevents any significant data loss.
The only disappointing issue so far has been Windows mirror fail once in a while. It’s hard to say why, but it seems that sometimes the mirror just breaks and needs to be restored. It takes a LONG time. By that I mean it’s been running for over 30 hours now and it’s 48% complete. No data has been lost, but it’s still rather annoying when it happens. Last time I blamed a power failure but this time there was none. The mirror just broke while my workstation was running. I am not sure what’s the deal with it as the HDDs are new and checkdisk doesn’t seem to find any issues.
As promised,here is a little example of how to use SyncQueue with TaskGroup when you need to queue multiple requests and wait until all of them come back. This is a fairly common situation that is not addressed by the standard BackgroundWorker control.
Here is a piece of working code from one of the projects I’ve been working on. It searches and fetches all health care claims that belong to a specific hospital visit and displays the visit once all the claims are fetched:
private void FetchVisitData() {
// create new SyncTaskGroup in the constructor
Global.TaskQueue.Append( new SyncTask( this, new SyncTaskGroup() ) {
Action = task => {
// find all claims that belong to the visit
var claims = Global.Claims.Search( Data.Claims.ClaimSearchAction.VisitId,
VisitId, 500, Data.Claims.ClaimSearchKind.Equals );
ListData list = new ListData();
foreach( Data.Claims.ClaimSearchResult info in claims.Items ) {
// when we queue a task, pass the group from the parent task as the first argument
Global.TaskQueue.Append( task.Group, new SyncTask( this ) {
Context = info,
Action = task2 => {
// fetch claim data from the server
MapData map = GetClaimData( (Data.Claims.ClaimSearchResult)task2.Context );
// add claim to the list, lock as this happens concurrently
lock( list ) { list.Add( map ); }
},
Error = Global.ReportTaskError,
// completion of the child task notifies the group
Complete = task.Group.TaskComplete
} );
}
task.Result = list;
// in a concurrent environment (especially multi-CPU) the first group task can
// complete BEFORE the 2nd task is queued, therefore triggering a premature
// completion of the whole group. To address this a task group needs to be manually
// released when all child tasks are queued
task.Group.Release();
},
Error = Global.ReportTaskError,
Complete = task => {
// we will only get here when all the claims have been fetched and the group task is completed
Visit = new MapData( "Claims", (ListData)task.Result );
RenderVisit();
}
});
}
SQLite vs. SqlServerCE
Yesterday, I switched one of my projects from Microsoft SQL Server CE 3.5 to SQLite. The project crawls through a set of files and stores index information in a simple database. I will later use this information to organize the files and do searches.
I went and downloaded SQLite ADO.NET on a whim and plugged it into my project. The results were pretty stunning.
On the test data set (around 50MB worth of files), the initial indexing went around 30% faster. The size of the index file went down from 12MB to 7MB. All this on top of the fact that there is no run-time installation for SQLite!
Kudos to SQLite team. It’s a very impressive product, a lean and speedy small database.
Optimize your string operations

Optimization does not always mean optimizing your code logic. In fact, more often than not, it means optimizing your data structures. Here is how a little trick can save a lot of CPU cycles on common string operations.
Strings are usually stored as an array of chars with a #0 at the end. One of the most common optimizations is putting string length in the negative offset. You spend time calculating it once and every time you do something with a string that requires its length you capitalize on using an extra piece data. Another very useful thing is to padding strings with ” at the end. Let me explain what I mean by this:
When you allocate a chunk of memory for a string you always get a memory chunk aligned to the machine word. For example, if you are trying to allocate 3 bytes you will get 4 bytes on a 32-bit machine and 8 bytes on a 64-bit machine. What happens to the extra bytes at the end? Usually nothing. They are wasted- but they don’t HAVE to be.
What you can do instead is spend a couple extra cycles and set them all to #0.
Let’s take a look at the string “HELLO”. It takes 8 bytes on both 32- and 64-bit CPUs. Normally it looks like this:
[ 'H', 'E', 'L', 'L', 'O', '#0', '?', '?']
But, you could easily make it look like this:
[ 'H', 'E', 'L', 'L', 'O', '#0', '#0', '#0']
It will take a several CPU cycles but consider the benefits for a 64-bit application:
1) Less cycles spent on string comparison, one of the most common string operations. In this example, it only takes 1 compare to find out if the content of two “HELLO” strings is equal!
2) It is much cheaper to calculate the length of a string. You do one compare per 8 bytes and then do 1 byte scan of the tail:
#include <stdio.h>
__int64 str_len_64( char* str ) {
char *e = str;
// main 64 bit scan
while( *(__int64*)e && 0xFF00000000000000 )
e += sizeof( __int64 );
// tail 8 bit scan
while( *e ) e++;
return e - str;
}
void main( char** argv, int argc ) {
// even for a short string
char* str = "HELLO, WORLD\0\0\0\0";
printf( "string: %s\n", str );
// we can reduce the number of compares from 13 to 6
printf( "length: %d\n", str_len_64( str ) );
}
3) Most of the other operations (including copy) can safely treat a string as an array of __int64, greatly reducing the number of CPU operations needed.
It’s a little trick, but the savings can be very impressive on a large scale system, especially if you go 64 bit. Just think about how many times you compare and copy strings.
Anemic Domain Model
Today I stumbled across a conversation about the Anemic Domain Model.
It sounds like a some sort of terrible disease. Martin Fowler blasted it as an anti-pattern. Wikipedia page states 1 benefit to 7 liabilities.
Even people that humor it tend to say it is limited to only inexperienced teams and small projects:
Jimmy Bogard writes:
Times when you can’t do a full domain model can include situations where you don’t have access to a domain expert, your team isn’t experienced with OO, or time constraints limit the conversations and analysis needed to build a decent domain model. If the domain isn’t complex, or is largely data-in, data-out, then those are contexts where a full domain model likely isn’t needed as well.
Greg Yong writes:
Our domain is not extremely complex…. Our team is not highly functional in object oriented design and analysis. When I say this I mean that team members must be highly functional in order to attempt creating a domain model (all other attempts will be doomed to failure, likely as an anemic domain model).
Sounds pretty bad, to be honest. Still, somehow I am not convinced. The reason I am not convinced is very simple: complexity. Look at the quotes above, they both mention ADM being not half as bad for “not so complex” domains. But do you really want to have your domains that complex?
Complex systems tend to require lots of configuration glue to keep them together. They are hard to learn and require developers to understand the whole thing before they can become fully productive. In the real world, complex projects tend to grow larger and more complex, original designers move on or move up, new features and entities keep being are added, and eventually such projects turn into a Katamari ball.
Anemic Data Model, on the other hand, means that you have very simple data objects (if they grow too big or too complex it’s a sign that you need to split them) and a set of well defined interfaces for processing these objects. Instead of making your model more complex you separate components and make them interact using interfaces, shifting towards a loosely coupled model from a complex monolithic model.
It’s a bit of an old Unix paradigm: make one component do one thing really well, define it’s interface, interact with other components when you need to.
Loosely coupled projects are a lot easier to maintain, evolve, and scale. Simple data structures and simple interfaces allow for just that. If your project is getting too complex, figuring out how to simplify data structures and interfaces might be a better choice than working out a more complex data model.
P.S. This is one of the reasons why protocol buffer messages produced by proto-cs are declared as sealed. They are data messages, there is no need to inherit from them.
