Archive for the ‘development’ Category
Concurrency in new and existing programming languages
New languages
New languages are popping up almost every other day. If you consider languages being under major development (including core syntax and library changes) there is Go, Rust, Scala, F#, Gosu, just to name a few. In the wake of the immense popularity of dynamic languages such as Ruby, Groovy and Python, these new languages are statically typed. It is quite obvious that there are some issues that current languages don’t address. These issues are pressing enough for people to undertake such an expensive exercise as creating a new language.
To understand things better I look at 5 major categories: memory management, libraries, readability, scalability, and deployment.
- Memory management has arguably brought about the biggest productivity gain in the last 2 decades. Major imperative languages, such as Java and C# gained Garbage Collector (GC) and made developers stop about allocating memory.
- Libraries are important for writing meaningful applications. Good libraries make a world of difference. For instance, WinForms is head and shoulders better than Swing. It has nothing to do with the language per say (even though C# support for events makes a huge difference for GUI libraries), but it certainly reflects on the language. One of the major problem for the new languages is lack of libraries.
- Readability of the code is paramount for a language. If developers have hard time reading code written in the language, they will be much less likely to use it. On top of that, languages that are hard to read make support and maintenance more expensive. If you have ever spent hours trying to figure out what a piece of code does, you understand what I mean.
- Scalability implies that software written in your language should be able to scale on modern hardware. For instance, a language with a single threaded model can’t take advantage of multi-core processors. A language without support for concurrency will make writing scalable apps much harder.
- Deployment is important because it’s when the software actually gets to work. There is always cost associated with deployment. Choosing Ruby will likely require more servers. Choosing C# or F# will require Windows licenses when you deploy. C# will enable you to deploy on Windows Azure , while Ruby will give you an option of Heroku.
Applying these 5 categories to the mainstream languages makes it easier to see why there is a need for new languages.
C
- Unmanaged
- Good basic libraries
- Poor readability
- Fast code, no built-in concurrency support
- Runs everywhere
Java
- Managed (GC)
- Great deal of libraries and frameworks
- Poor readability, verbose
- lack of proper generics, closures, lambda expressions, events, etc.
- VM + JIT, no built-in concurrency support
- Runs everywhere
C#
- Managed (GC)
- Has limited number of libraries and frameworks, Microsoft dominated
- Great readability, has support of most modern features (lambda, closures, generics, etc) as well as LINQ
- VM + JIT, no built-in concurrency support
- Only runs on Windows, license costs might be prohibitive for larger projects. Cloud deployment available with Azure.
Ruby
- Managed (GC)
- Has a good deal of libraries and frameworks
- Great readability, expressive and concise syntax
- VM, slow dynamic code, no built-in concurrency support
- Cheap deployment but known to cause problems in larger deployments (e.g. Twitter)
Python
- Managed (GC)
- Has a great deal of libraries and frameworks
- Good readability
- VM, slow dynamic code, no built-in concurrency support
- Cheap linux deployment but can require more servers for large scale sites
Notice a common theme? Check category number 4 for all listed languages. No built-in concurrency support.
When all these languages were designed the requirements for the concurrent requests execution were much lower than what we have now. There has been a tremendous shift in volume of requests.
Concurrency: CGI
Massive need for concurrency first happened when web applications became available. The first answer to running concurrent scripts was Common Gateway Interface (CGI). It made an HTTP requests look like it was executed by a user, mapped requests parameters to the environment variables, passed input to stdin and sent stdout back to the browser. First implementations would start a process for each request. For instance, if you had a python script it would execute Pyton interpreter for every request. The model relied on OS security and access control and was comfortable at the time as many tools operated with stdin/stdout. However, starting a process for each requests quickly becomes prohibitively expensive.
Concurrency: FastCGI
The answer was found in FastCGI, where a process wasn’t closed at the end of the requests but instead re-used to process next request. FastCGI allowed for a pull of processes to serve a lot of incoming requests, allowing for a better scalability. This model is still widely used for many deployments. However, you quickly find out that switching processes is an expensive task and that inter-process communication (IPC) is also expensive. Another problem is sharing of resources. For instance, each process needs its own database connection pool, instead of having a shared one.
Imagine having 100 front end servers each running 5 FastCGI processes. If no SQL queries are executed in parallel you would need 500 connections to the database. Depends on your database, it can be very expensive to have that many simultaneous connections open.
Concurrency: Thread Pool
The answer to the problem was found in using thread pools. Instead of having multiple processes we run one process per application but we create multiple threads withing the process. Threads are much lighter than processes so context switches are much cheaper, you can run many threads per process (normally anywhere between 50 and 250) , and you can have access to shared memory and resources such as DB connection pool. However, for large scale web applications even this model has its limitations. Your code is expected to be synchronously executed on a single thread that is dedicated to processing your requests. The code blocks your thread while executing. If you have a to access a resource with a slow response time (for instance a web service such as google, facebook, twitter, etc.) your thread has to wait until the sub-requests is executed. If underlying resources are slow, your thread pool will quickly becomes starved: every available thread will be waiting for sub-requests, and no new requests will be processed. This is a very common problem.
Concurrency: Fibers
Many people are coming to realize that thread pools are quite ineffective for applications that have massive load. Many people also realize that you don’t need to be next google to experience these issues; you can have a popular iPhone app or mentioned on Reddit/Digg/Slashdot and find our that your services are bogged down.
The problem with scalability is that you need to be able to survive the burst. It can be extremely cost ineffective to scale hardware to that level. For instance, most of the time your web farm will be doing nothing, but you need to maintain it for that moment when you get mentioned on Twitter and the traffic comes your way.
I mentioned iPhone where a simple free application can become available to millions of users instantly and setting up a web farm with many servers in it is simply not an option. One solution would be to utilize the super powerful servers that you have to the max of their potential. You achieve this by following one simple rule: don’t waste resources on waiting if there is something else to work on.
If you want to achieve maximum throughput you have no choice but to be asynchronous. Instead of waiting for that Google API request to come back you can start processing new incoming requests and get back to serving the first requests once the data is available. It requires a primitive lesser than a thread. You might have heard names such as “fiber”, “tasklet”, “channel”, “coroutine”. Let’s call it “fiber”. Fibers are micro-threads that are executed concurrently. When your fiber encounters a long running operation it schedules it to be executed and yields to another fiber. Essentially, you execute cooperative multitasking within preemptive multitasking.
An ideal application will execute on top of a thread pool with N threads where N = number of CPU cores. You don’t need extra threads, as your operations never block. In reality you might want more threads to smooth things out.
Concurrency: Libraries
This is by no means a novel idea. There are plenty of solutions written to address this specific issue. There is Tornado, Node.js, any many others. I personally think the fact that JavaScript in the browser requires developers to only use non-blocking (another name for asynchronous) calls has advanced the cause greatly. The performance is there, so what’s the problem? Readability and control flow. Consider this code:
function updateStatus() {
try {
var status = twitter.getStatus();
displayStatus( status );
}
catch( error ) {
displayError( error );
}
pretty straight forward, huh? Now, imagine writing this code in a non-blocking way:
function updateStatus() {
twitter.getStatusAsync( onUpdateStatusSuccess, onUpdateStatusError );
}
function onUpdateStatusSuccess( status ) {
displayStatus( status );
}
function onUpdateStatusError( error ) {
displayError( error );
}
Not nearly as hot. Now imagine that displayStatus and displayError are non-blocking calls either. You will quickly end up with a very fragmented code.
Languages that support first class functions or lambdas allow for writing success/error functions inline, however it generally leads to the continuation-passing style which has same problem across the languages: non-blocking calls make natural language constructs such as structured exception handling, loops, condition statements, etc. ineffective.
This is a very important thing to understand. Languages designed as they are with a certain control flow in mind. They have control structures in place to assist the developer in accomplishing certain task. For instance, foreach loop can usually be replaced with for and switch/case/default can be replaced with if/elseif/else construct, but they are there for a reason. They serve to increase productivity through providing more concise and expressive code. Non-blocking libraries render you unable to use a lot of the language constructs and produce poorly structured code.
Concurrency: Language support
It’s quite clear that if you want to achieve the best possible scalability and maintain readability, you not only need a non-blocking library, but also a language that supports this specific style of development.
Let me outline this: to enable wide-spread, practical use of non-blocking paradigm a programming language needs to provide built-in support for it.
This is the theme of all new languages. Whether it is co-routines in Go, actors in Scala, tasks in Rust, or new async/await keywords in C#, all these languages are trying to get a crack at the same problem: make a concurrent development easier and more available to developers.
All new languages (with notable exception of C# 5 approach) implement a form of a message exchange. You can send message from one part of your program to another. Unfortunately, I don’t think this address the issue properly and, while a step forward, is far from the desired solution.
C# is taking a different approach that will reduce code fragmentation a lot better. However it will be a limited success due to the need to explicitly define functions as async and use of Task<T> and thread pool.
I think there is a better solution to the concurrency issue, one that could be applied to the existing languages by introducing libraries and compiler magic around those libraries. However, this is a topic for a different conversation.
Why people don’t switch from WinForms to WPF
I posted this as a comment here:
http://10rem.net/blog/2010/11/16/windows-forms-developers-tell-me-about-your-applications
and figured I’ll re-post it in my blog.
I have a lot of experience designing WinForms apps. Here is a screenshot from my the latest app I designed: http://execqview.com/images/facility-age-gender.gif.
This is a very simple screen. There are quite a few of more complex screens with hundreds of controls and a lot of data binding/data entry.
I had an opportunity to move to WPF, however I ran into some problems that prevented me from doing so:
1) Lack of business controls. WinForms doesn’t have a lot of good controls built in, but there are mature 3rd party libraries like DevExpress. For instance, take one of the most common controls in business apps: data grid. The grid from my screenshot up above comes with bands, multi sorting, totals, column selection, grouping, reordering, filtering, incremental search, custom cell rendering, cell controls, export to excel, and many other features. It can hold 100,000 records without any consequences to performance whatsoever. This is just one example. Tree view, tree list, scheduling controls, layouts, menu controls, ribbon controls, tab controls, charts, etc. All these things are readily available for WinForms development and provide an amazing productivity boost. The market for WPF controls is a lot weaker. Even established vendors like DevExpress have rather weak libraries for WPF.
2) Incredibly complicated design system. In WinForms you have controls that you can anchor to other controls- a very simple concept. In WPF you have different layouts that are cumbersome and yet inferior to anchoring in terms of real work. On top of it, you have access to myriads of properties such as gradients, layouts, panes, paths, resources, templates, 3D, timelines, and many other things. It’s just crazy complicated and almost assumes that you have to have a designer on the team. Setting up even a simple application requires a lot more work with WPF which results in lower productivity.
3) As a developer, you can’t leverage almost anything from your WinForms experience. Literally, you have to throw what you know away, clear your mind, and embrace all kinds of new concepts such as MVVM, new data binding, new control structure and design, resources, etc. You have to learn new tools such as Blend. It takes a while to get productive using all these new concepts and tools. We are talking about a serious learning curve here, a curve that is not easily justified at the moment.
4) Most WPF applications don’t have a good feel to them. I am sorry, but it’s true. They are slow, clunky, and non-native looking. You CAN make them look slick and flashy, but it requires a lot of effort and a set of strong artistic skills. Unfortunately, most GUI applications are written by businesses for businesses. Teams don’t have access to designers and have very strict time lines. Not a lot of mainstream developers have a luxury of turning their apps into a piece of art.
Essentially, it takes a lot more effort and time to write WPF applications and unless you need a very flashy app there are no clear benefits that you gain. There is
Technical tax
In the previous post, I talked about technical debt. It’s an essential part of any software project that has shipped with time line in mind. However, at the same time it’s one of the major reasons why projects have to be rewritten from the scratch so often.
You do want to manage your technical debt and there are ways to manage it.
All you need is time
As I mentioned before, there are only 2 ways to get rid of debt: repay it and default on it. If you are trying to avoid a default, you don’t have a lot of options. You have to repay it.
Debt is a claim on future labor. That labor needs to be provided, which means someone needs to put the hours in. It is possible that you will need to spend more hours down the road than you would have spent initially. That’s fair; it’s the interest you pay.
The problem becomes getting these hours from the management. It needs to be timed right and explained in a way they can understand.
Feature holiday
One of the most straight forward ways to deal with the problem is to set aside a period of time right after the release. No new features, only bug fixes and re-factoring. Depends on your release cycle it can be a couple weeks to a month. Either way it needs to be a significant amount of time so that you can actually go and implement important changes.
The upside of it is you can ask for this time before release, while your management is focused on getting there. It won’t seem like such a big deal to do it right after the release, and it might be an easy sell. The downside of this approach is that you don’t know how much time you need. Sometimes you need more time, but sometimes you need less or not much at all.
Technical Tax
Another way to get the time needed is to tax your business people for hours they spend.
Management generally can comprehend the idea of maintenance and doing some re-factoring. Depending on your organization type, you can explain it using different methods. Say, If you are in the Agile camp, you can say it’s in the Bible Manifesto: “Continuous attention to technical excellence and good design enhances agility.” Just ask them if they want to increase agility, or if they don’t believe in Jesus Martin Fowler.
What it really comes down to is taxing your business people for hours you need to spend on maintenance. They don’t get to have those. Those hours are not theirs. The rate can go up or down, depending on the situation, but the hours need to be taxed until the debt is repaid. Let’s call it Technical Tax.
How to tax hours
Lets assume that as a developer you work 40 hours a week. The actual number is irrelevant, some companies use points, some allow to work 1 day a week on your personal project, etc. But say you work 40 hours a week. You have 3 people on your team, that comes down to 120 hours a week.
Your team worked 240 hours a week for half a year to get the first release shipped. You worked so hard that your girlfriend left, but you only realized it when you accidentally turned on your phone and listened to your voice mail. It was the only message there, as everybody else pretty much gave up on calling you. However, there is a new day and the project is now shipped. You can relax and work your 40 hours a week, like a normal person.
Of course, you cut a lot of corners trying to ship your software. Now you need to go back and fix it, refactor bad code, implement all the little optimizations you had in mind. This is where you find out that your product manager insists on implementing a list of new features that the sales want in order to close a big client. You are told that re-factoring something that works is “bad ROI”.
This is where your team needs to tell management that paying debt is NOT an investment and start taxing the hours that business people get. Figure out what refactoring you need to do, create a plan, go to your business, and tell them you need XX hours a week to fix what needs to be fixed. Just like ‘feature holiday’, the best time to implement something like this is right after a release. It may vary from project to project, but it should be around 10-30% of the total hours. At 15% your team will get 18 hours a week, which means one of you can spend 2 days per week repaying the debt, refactoring code, etc. Business people get to play with the rest of the hours, but they can’t touch these hours under any circumstances.
If you feel it’s not enough — negotiate a rate increase. It’s not going to be easy; nobody likes tax hikes. The trick is, though, once you have the tax system in place, you will be debating how many hours you spend paying off your technical debt, not whether you do it or not!
If you have too much debt, you might need get more people on your team or improve your practices. If you need to release something fast you can announce a tax holiday, but always set the end day for it and always try to get more hours after it’s done.
How to manage technical tax
If taxes go too high, people will try and avoid them. The same is applicable to the technical tax. Keep it as low as possible and people will justify it by having a better product. Less technical debt means you will be implementing features faster, your product will perform faster, it will scale better, it will require less maintenance. Before you know, business people will fight to get the credit for implementing it and boast around how they outsmarted the developers.
There are a couple things you need to watch out for:
- The tax will only work if you use it to repay the debt. It’s not your ‘do whatever you want’ time. You do what needs to be done and then it will work.
- If you run out of critical things to do, you should reduce the tax. Always remember that it’s a necessity and that the goal is to minimize the debt, not maximize it.
It’s OK if the tax goes to a very low amount, say 2 hours a week for just reviewing that everything looks good. Just make sure it doesn’t go away so you don’t have to have a debate about implementing it again.
Conclusion
Use technical tax to strengthen your project and to keep it in a state where it’s ready to expand, ready to accept new clients, ready to scale, ready to be deployed on more servers. You’ll never want to rewrite your application from scratch ever again.
Technical Debt
Technical debt
You have probably heard the term “Technical Debt”. All-knowing Wikipedia describes it as:
Neologistic metaphors referring to the eventual consequences of slapdash software architecture and hasty software development
Sometimes when you need to ship or make a release you have to cut corners. You know it’s not the best thing to do, but it’s a quick thing thing to do, and you need to ship your product.
There is nothing wrong with cutting corners to make a release. In fact, if you don’t have to cut corners, you are shipping too late. Virtually every single software project that shipped had to cut some corners. Sometimes more, sometimes less. No matter what, there is technical debt associated with each project.
Understanding technical debt
It’s important to understand what debt is. Debt is a claim on future labor. It’s an obligation to either do something later or provide resources (money, for instance) that are equivalent to labor.
When you owe money, there are 3 ways to get rid of debt: pay up, default, or print money (if you are a government). Now, most of us can’t print money, so it boils down to either paying the debt or defaulting on it.
The last important thing to understand about debt is interest. Because you get something right away and promise to pay for it later, you usually have to pay for it in a form of interest. This is why, when you borrow money from a bank to pay for your house, you have to pay interest to the bank. You get get the house right away, and the banks makes money.
Technical debt is not an exemption to any of these rules:
- It’s a claim on future labor. You promise to go back and fix some of the issues that you didn’t have time to do right in the current release.
- Down the road you have a choice between working on those items or defaulting on your debt by doing nothing. Just like with normal debt, defaulting on a debt comes with consequences. I will describe them in a moment.
- Fixing something way down the road tends to be more expensive than fixing it right away. Partly because developers move on and don’t remember all the details, partly because there can be new components that expect you to work in current non-optimal way. This is the interest you pay.
Why no-one repays technical debt
While there are always intentions to fix something down the road, more often than not it doesn’t happen. New issues arise, new features need to be shipped, priorities change, etc. It is a lot more common for businesses to pay technical interest until it becomes unbearable, default on the debt, and then file for bankruptcy in the form of deciding to rewrite the whole damn thing from the scratch.
Let’s get this straight: rewriting your software from the ground up is an expensive thing to do. Yes, you lose your technical debt, but you also lose a lot of your assets. Just like when you file for bankruptcy, you are likely to lose some of your assets. Usually, a very significant portion of them. The same rule applies to software project: you will lose all the good code that you’ve written, tested, debugged over the years. It represents a big investment. While some of it might be salvaged, a lot of it will be gone. It will have to be re-written, re-tested, re-debugged. New deadlines will have to be met and guess what? New technical debt will be accumulated!
We see this cycle over and over again. Companies abandon old projects and announce moving to the “new and better” projects all the time. Why?
There are several major factors contributing to it and they explain why it happens so often:
- Business people don’t see technical debt. If you think they understand your blabbering about re-factoring a piece of code to work faster you are wrong. They can humor you by pretending they understand but they don’t. If it works – it’s done. They want new features for new customers. New customers bring new revenues and bigger bonuses. System being sluggish and crashing often is not their problem, it’s the fault of developers and hosting people.
- Perceived low ROI. This is a direct consequence of #1. When talking to business people you can often hear “It doesn’t generate us new revenue so it’s low ROI”. The problem is that ROI stands for “Return On Investment” and we are not talking about investment! We are talking about paying off debt. Debt is not an investment, it’s something you owe. When you buy a house you borrow money from the bank, creating debt. Then you invest borrowed money into the house. If your house goes up in value you made a successful investment, but it doesn’t change the fact that you still owe the bank. Now, think about it as saying that repaying your mortgage has a low ROI since, you know, you already have the house.
- Developers don’t like repaying technical debts. Well, let me correct myself. They despise it! Writing new code is more fun then fixing old code. Especially if its not written by you. Developers might not like creating technical debt (gah, have to cut corners again!) but fixing old code is not something developers crave to do either. Let’s be honest about it: developers would rather bitch about how the whole thing stinks than fix it. Oh, it’d smell like roses if only they had a chance to rewrite the whole damn thing.
- Development processes don’t focus on it. Modern development processes gravitate towards features and bugs and tend to ignore technical debt. There are no user stories for products requiring a lot more servers to scale, for having to struggle with strange joins because database schema is designed badly, for developers spending extra hours working around legacy code quirks, or for having a much steeper learning curve for the new people. The list goes on and on. Unless it gets unbearable and produces acute disruptions to the business process (outages, slow beyond reasonable, etc.), modern development methodologies tend to discourage long running maintenance projects.
All these things contribute to not repaying technical debts. Nobody likes it, but that’s just the way it works.
Conclusion
When working on complex software projects, technical debt is a fact of life. It’s created often and willingly and it’s extremely hard to repay. Once enough debt is accumulated the project is deemed ‘old and crappy’ and a ‘new and shiny’ project emerges. While it’s a terrible business practice and comes with a huge cost in time and money, it’s more of a norm than an exception in the modern software engineering.
In the next post I will show one way to repay your technical debt on time and have a stable technical economy for your project.
Windows 7 HDD Mirror Issue, SyncQueue TaskGroup
I run Windows 7 on my workstation with SSD + RAID1 setup. The SSD contains Windows, Visual Studio, and some other tools that I use. The mirrored HDDs contain all data files and a backup copy of the SSD. This setup has been working fairly well for me because it gives an outstanding performance of the SSD drive (Windows 7 boots up in under 8 sec) and a stability of RAID1 which prevents any significant data loss.
The only disappointing issue so far has been Windows mirror fail once in a while. It’s hard to say why, but it seems that sometimes the mirror just breaks and needs to be restored. It takes a LONG time. By that I mean it’s been running for over 30 hours now and it’s 48% complete. No data has been lost, but it’s still rather annoying when it happens. Last time I blamed a power failure but this time there was none. The mirror just broke while my workstation was running. I am not sure what’s the deal with it as the HDDs are new and checkdisk doesn’t seem to find any issues.
As promised,here is a little example of how to use SyncQueue with TaskGroup when you need to queue multiple requests and wait until all of them come back. This is a fairly common situation that is not addressed by the standard BackgroundWorker control.
Here is a piece of working code from one of the projects I’ve been working on. It searches and fetches all health care claims that belong to a specific hospital visit and displays the visit once all the claims are fetched:
private void FetchVisitData() {
// create new SyncTaskGroup in the constructor
Global.TaskQueue.Append( new SyncTask( this, new SyncTaskGroup() ) {
Action = task => {
// find all claims that belong to the visit
var claims = Global.Claims.Search( Data.Claims.ClaimSearchAction.VisitId,
VisitId, 500, Data.Claims.ClaimSearchKind.Equals );
ListData list = new ListData();
foreach( Data.Claims.ClaimSearchResult info in claims.Items ) {
// when we queue a task, pass the group from the parent task as the first argument
Global.TaskQueue.Append( task.Group, new SyncTask( this ) {
Context = info,
Action = task2 => {
// fetch claim data from the server
MapData map = GetClaimData( (Data.Claims.ClaimSearchResult)task2.Context );
// add claim to the list, lock as this happens concurrently
lock( list ) { list.Add( map ); }
},
Error = Global.ReportTaskError,
// completion of the child task notifies the group
Complete = task.Group.TaskComplete
} );
}
task.Result = list;
// in a concurrent environment (especially multi-CPU) the first group task can
// complete BEFORE the 2nd task is queued, therefore triggering a premature
// completion of the whole group. To address this a task group needs to be manually
// released when all child tasks are queued
task.Group.Release();
},
Error = Global.ReportTaskError,
Complete = task => {
// we will only get here when all the claims have been fetched and the group task is completed
Visit = new MapData( "Claims", (ListData)task.Result );
RenderVisit();
}
});
}
SyncQueue
As promised, here is a little library that allows for background queuing of long running synchronous operations. Unlike the standard BackgroundWorker component, it’s not a visual component. Additionally, it allows a little bit more control over what’s going on. Also, I personally prefer lambdas over events, because it keeps all the code relevant to the background process in the same place.
Here is the code:
http://code.google.com/p/syncqueue/source/browse/syncqueue.cs
Here is how you can use it:
// create a new queue with 4 worker threads
TaskQueue queue = new TaskQueue( 4 );
// 'this' must inherited from System.Windows.Forms.Control
SyncTask task = new SyncTask( this ) {
Action = task => {
// task action, execute your background task here
// executed on a worker thread
// use task.Result to return values
},
Complete = task => {
// task complete without errors, called on the main thread
},
Error = task => {
// an error occurred during execution, called on the main thread
// exception is stored in task.Exception
// if an exception if thrown in this method it will be ignored
},
Finally = task => {
// if exists, always called at the end on the main thread
}
};
// en-queue the task for execution
queue.Append( task );
The library also allows for spawning sub-tasks using SyncTaskGroup class. I will illustrate how it can be used in the next post.
Elvis is back
After an extended period of silence I am back. A lof has happened but I am back in business of blogging.
To compensate for the missing posts, I will publish a little but handy C# library for proper handling of the background operations. The built-in BackgroundWorker component is rather weak. I just need to put proper comments and figure out a free license for its distribution.
Stay tuned!
Flexible vs. structured data for visual binding
Overview
If you have ever written a visual application, you know how painful it can be to bind all the textboxes, comboboxes, buttons, lists, trees, grids, action bars, etc. to your data model. I know some people may think that the problem has been successfully solved in .NET with System.Forms.BindingSource, strongly typed datasets, and other tools that Microsoft provides. I beg to differ. All these tools work great in small examples. However, if you try to scale them for a large application visualizing complex data structures, there is a lot to be wished for.
Data binding can be extremely tedious and time consuming. It also limits the changes you can do to your data model.
Direct binding
The devil, as usual, is in the details. Often times the optimal way to visualize the data differs significantly from the optimal way to store the data. It creates an impediment that needs to be overcome.
If you are working with structured data (classes or strongly typed data sets), you will have to accommodate for the difference on the data binding level. You will need to configure rules your visual components will use to extract and present the appropriate pieces of data from your data model.
Some rules can be relatively simple. Say, you want to display a date in a specific format. A visual control might let you configure the date format. If you display a numeric value in a grid, you might be able to configure the column to show numbers are currency.
Some rules can be more complicated. Imagine a class like this:
public class Occurrence {
public string Code;
public DateTime Begins;
public DateTime Ends;
}
Now, here comes trouble. In .NET DateTime is a value type and cannot represent a null. A common practice is to set the value to DateTime.MinValue. Of course, we don’t want to see 1/1/0001 when we are displaying a list of Occurrence objects in a grid, so normally we’d have to go and update our Occurrence class to look something like this:
public class Occurrence {
public string Code;
public DateTime Begins;
public DateTime Ends;
public string BeginsString {
get {
return Begins == DateTime.MinValue?
string.Empty: Begins.ToString();
}
}
public string EndsString {
get {
return Ends == DateTime.MinValue?
string.Empty: Ends.ToString();
}
}
}
Now we can display BeginsString and EndsString in a grid, instead of Begins and Ends. It will properly show empty cells when displaying a record. Essentially, we introduced two properties to overcome the impedance I mentioned.
Control Pads
Consider the two bad things that happened in the example above:
- We had to go and change the model for the view. It is a generally bad practice and it’s not always possible to change the model like this.
- Our model now dictates the view how dates are supposed to be displayed via ToString() method. We really don’t want that. The view is supposed to dictate visual formats and such, not the model.
A solution to that can be found in introducing a 3rd party instead of modifying the model. I call classes like that control pads. It’s an entity that connects the model to the view, acting like a broker.
For our example above a control pad will direct how Occurrence records are displayed in a gird and will display empty values for the cells containing DateTime.MinValue.
This is by no means a new idea, it has been around for years. The limitations of this approach are well known. For complex model structures it either creates a lot of different control pads (more entities = bad) or it forces developers to generalize control pads and use configuration (things get hairy). At some point control pads require their own event routing, and things get even more complicated.
Model-View-ViewModel
Control pads are for controls, but imagine creating one big fat control pad for the whole page, screen, form, or whatever view you have. This will get you a ViewModel.
Microsoft calls this approach a MVVM (or Model-View-ViewModel Design Pattern). It’s somewhat based on Martin “UML” Fowler’s Presentation Model (note that work-in-progress remark on top of the page in conjunction with May, 2004 publishing date). It is presented as a new and revolutionary approach, which, of course, it neither. It wasn’t called a pattern or MVVM but the idea has been around for years.
The idea is simple. The difference between the view and the model is accommodated by creating a ViewModel. The number of the view models is limited to the number of views in your application. It provides for any form of data transformation and flexibility and it’s very friendly for unit testing.
It’s a decent approach, I used it for some of my apps. The downsides of it are low re-usability of the ViewModel code and the number of extra entities you might need to create.
Flexible Data
Another way of dealing with the data can seem quite obvious but is actually not used as much in visual applications, outside of web apps. If binding to the original data model can get complicated and you don’t want to use MVVM, then you can use flexible data.
Flexible data is a fancy way to call map/list data- sort of what Google uses in its Closure Templates. Think of JavaScript arrays and objects, where an object is essentially a hash map of properties and arrays are just list of objects.
This is how it works:
- You get your structured data. You do want to keep your data structured for storage and processing out of performance considerations.
- When you need to visualize your data, you map it into a flexible map-list model that is tailored for your view.
- You display the data out of your flexible model. It’s easy because the model is compatible with the view.
- (Optional) you collect the changes and apply then back to your structured model, then save them.
Yes, it encourages creating a copy of the data. But, for a visual application, it’s totally worth it. Consider this code for the example above:
class MyView {
public ListData GetOccurrenceList( Occurrence[] source ) {
ListData list = new ListData();
foreach( Occurrence o in source ) {
MapData data = new MapData();
data.Add( "Code", o.Code );
if( o.Begins != DateTime.MinValue )
data.Add( "Begins", o.Begins );
if( o.Ends != DateTime.MinValue )
data.Add( "Ends", o.Ends );
data.Add( "Source", o );
list.Add( data );
}
}
}
Now all you need is a single control pad for a grid that takes ListData and displays the maps inside it as records. If you don’t like the way the data is displayed, you can add formatting, combining, totals, averages, etc. to the resulting map. Note how I also assigned the original record to the “Source” key. You don’t have to do this, but, if you want to, it allows for the link to the original object. You can store changes in the same map and then apply them back to the original record, if you need it.
Conclusion
Flexible data allows you to avoid creating entities where they are not needed. It works great for web applications because essentially this is JSON (JavaScript Object Notation), the native way JavaScript stores data. This approach works for visual applications as well, be it WinForms or WPF. Give it a try, you’ll might just like the flexibility and simplicity of it.
Yet another “Code to Look At”
In the last post I mentioned some code samples from Interesting Finds by Jason Haley. Specifically, the Code to Look At section of the blog.
Here is yet another example of an interesting piece of code I found in that section: The .NET Asynchronous I/O Design Pattern.
The name implies that this is the way to get async done in .NET. After all, that’s what “pattern” means — a customary way to solve a particular problem.
Recently, the term lost its meaning since literally everything is a pattern now. Small solutions, big solutions, even custom solutions are patterns now. Sometimes, when I hear developers talk I can’t help but think of the scene from Being John Malkovich where John Malkovich gets inside his own head:
Let’s get back to the article. There are two big things that are wrong it.
1) The whole point of async is to avoid thread blocking. It’s in the first words of the Wikipedia page: “Asynchronous I/O, or non-blocking I/O…”
Non-blocking IO. Instead of blocking your thread, you queue an operation (or a set of operations) and allow thread to do more work. You don’t sit around and wait, you go and start processing other requests. When the async operation is completed you will be called via a callback. That’s the whole point. Fire and forget, until it’s done.
It is easy to understand if you are doing async right. A pure async application does not need to start more OS threads than physical CPU cores. While you can start more threads to increase responsiveness in certain scenarios (long calculations, for instance) it will not improve your RPS.
2) The code author uses to illustrate his “pattern” is a damn good example of how not to write code.
Take a look at this code from MultiHostLookup method (this method queues multiple async requests and waits for the completion):
lock (addressList)
{
// ensure all lookups have returned, otherwise wait
while (addressList.Count != hosts.Count)
{
Monitor.Wait(addressList);
}
}
So we lock addressList and we wait under the lock. Well, guess what happens when an operation is completed? According to the “pattern” this is what happens in GetHostAddressesCallback method:
// we need to ensure updates to the address list are threadsafe
lock (addressList)
{
addressList.Add(address);
// notify listeners that another address has been added
Monitor.PulseAll(addressList);
}
The thread will try and acquire a lock on addressList. Which, if you remember, is locked by the thread executing endless while loop in MultiHostLookup.
One thread waits for another under a lock. The other thread uses the same lock to signal its completion back. A wonderful example of “how to deadlock your threads” pattern.
Here is a simple rule for writing multi-threaded apps: NEVER EVER WAIT UNDER A LOCK. That’s a sure way to get your application deadlocked.
It is extremely frustrating to see such async 101 mistakes. IO Completion Ports have been around since NT 3.5- more than 15 years! If you are interested in how to actually write high performance async applications, that’s a good place to start getting the concept.
Interesting finds. Code to look at.
Lately, my news feed and I have been out of touch. Let me elaborate.
Most of the links I’ve seen recently could be put into two large categories: “mobile phones” and “I wonder what people smoke”. I am not a mobile developer and while I keep an eye on the mobile world, it’s a bit outside of my area of expertise. Most of the other stuff … well, I am not even sure what to tell you. It’s illegal to have weed this strong here.
For instance, one of the blogs I follow is Jason Haley. He posts links to other blogs, calls it “Interesting Finds”. One of the sub sections is called “Code to look at”. Well, take look at the today’s catch:
1) Fluent.Xml.Linq – Exploring the limits of C# syntax.
Not sure why it has XML and LINQ in the title. The article is about 5 different ways to include HTML in your C# code. Maybe it’s just me, but I have never ever needed to add HTML to my C# code. Sometimes I need to add code to my HTML (to control its generation), but never the other way around.
The brightest part of the article is a quote by Martin “UML” Fowler: One of the problems of methods in a fluent interface is that they don’t make much sense on their own. Isn’t it shocking? Fowler says there is a problem with something that doesn’t make much sense! Who would have thought.
2) Yet Another Singleton Implementation in .NET 2.0
In the very first sentence the author asks the most important question: Many of you aware what is Singleton Pattern is. For whom don’t know what is Singleton pattern is?
Lucky, Bill Clinton has already answered this deeply philosophical question back in his 1998 Grand Jury testimony: It depends on what the meaning of the words ‘is’ is.
3) JavaScript code to determine when DayLight Savings Time (DST) occurs.
This article gets you one Google question closer to the solution for a problem you are unlikely to have.
4) FileSystemWatcher – Pure Chaos (Part 1 of 2).
Yet another background thread that monitors file system changes. It’s been in WIN32 API since, well, WIN32 API. Thrilling.
5) User Story Source Code Layout with MSpec.
public class I_transfer_an_amount_less_than_my_savings_account_balance
Naming_classes_like_this_is_bad_for_your_Karma.
Seriously. Never ever name classes like this. No matter what the reason is.
6) Diagnostic Trace Display Using WPF.
A 10 line System.Diagnostics.TraceListener implementation.
Have a nice day!
