This is a little old, but there is a good article on optimization in Queue this month.
I really want to take a small issue with a couple of points here, though...
Software Layering
Many software developers become fond of using layering to provide various levels of abstraction in their software. While layering is useful to some extent, its incautious use significantly increases the stack data cache footprint, TLB (translation look-aside buffer) misses, and function call overhead. Furthermore, the data hiding often forces either the addition of too many arguments to function calls or the creation of new structures to hold sets of arguments. Once there are multiple users of a particular layer, modifications become more difficult and the performance trade-offs accumulate over time. A classic example of this problem is a portable application such as Mozilla using various window system toolkits; the various abstraction layers in both the application and the toolkits lead to rather spectacularly deep call stacks with even minor exercising of functionality. While this does produce a portable application, the performance implications are significant; this tension between abstraction and implementation efficiencies forces us to reevaluate our implementations periodically. In general, layers are for cakes, not for software.
vs...
Algorithmic Antipathy
For many software developers, algorithms are something they studied back during their college days, and thankfully not something with a lot of relevance to their day jobs. During Solaris 10 development, Solaris engineers fixed a long list of performance problems across the kernel and user libraries. Toward the end of the release, we spent some time reviewing just what had been improved and by how much—and what was the underlying cause of the performance problem. Interestingly enough, all the really big improvements (above, say, 200 percent) resulted from changes in algorithms. Over and over again, all the other performance fixes—using specialized SIMD processor instructions such as SSE2 or VIS, inserting memory prefetch instructions, cycle shaving—paled in significance compared with simply going back and rethinking the locking algorithms and/or data structures.
A key part of algorithm selection is having a realistic benchmark or workload in hand to support making decisions based on actual results rather than intuition or folklore. This means the most effective time to do performance and scalability work is in the earlier phases of the project, perhaps the exact opposite of what usually happens. All the clever compilation options are pretty useless when dealing with O(n2) algorithms for large values of n. Poor algorithms are the number 1 (and probably numbers 2 and 3 as well) cause of poor software system performance.
I guess this goes to one of those "unexpressables" in software design. Layers are good as long as the layers are replaceable. Abstraction needs to be near total in each layer of the software -- as much as possible -- so that after the fact algorithm changes don't require a complete restructuring of the application. Granted, the nature of the software here (SunOS) vs what "we" typically work on is vastly different. However, if the abstraction is so flawed that you end up passing around too much seemingly unrelated data and replicating structures, then maybe the layers at which you are abstracting were improperly drawn.
Comments
RE: Optimization AntiPatterns
OK, let\'s go over this again, remember:
1. Loose Coupling, Tight Cohesion
2. Premature optimization is the root of all evil.
Or maybe Grady Booch, Donald Knuth, and David Cutler are idiots.
I have experienced many instances of prejudice regarding architectural choices by the uninitiated. Very little can be done to convince these people that optimization should have it\'s emphasis in optimizing code paths that actually experiencing performance issues. Optimize code that runs most often, leave the rest alone.
In fact it is tightly coupled architectures like that promulgated by Microsoft that are the real issue.
And layers are for software, e.g. the 7 layer (OK 4-5) OSI network stack. It seems that the author of the original article may have been high when they wrote it.
For example Bill Gate\'s insistance that the graphics drivers be placed in the lowest levels of the NT kernel. Note: this is distinctly different than the approach that some recent Linux builds have used. See Linus Torvolds for comments on this.
RE: Optimization AntiPatterns
Yeah, I guess the question is, do you know what your goal is. Here the author is doing low level kernel stuff inside SunOS, and I can understand the "too many layers" issue there. However, he does SERIOUSLY jump the rails when he starts talking about the NSPR stuff. Maybe you do end up with deep stacks on Mozilla, and that will affect performance in the aggregate. However -- and this goes back to the "Optimize the common, not the bad" rule -- Mozilla tweaks its performance by making sure it doesn't have to do anything much heaver than O n^log3 routines above those deep stack calls.
However, it is exactly that layering that gives you both code longevity and developer productivity during the actual project.
RE: Optimization AntiPatterns
Ok, I was about to jump all over that "layers" stuff from the excerpt - then I read the the comments, nuff said. Layers most certainly DO matter, though yes overhead and changes to each interface solely for the sake of the layers should be considered. And that premature optimization famous quote is still very relavant, though its important to note that Tony Hoare first said it, Knuth repeated it a lot.