State coverage

November 19th, 2007

A widespread conception is that programs really have two sides: their source code and their runtime.

With most languages, the distinction is made quite clear by the role of the compiler: source code cannot be executed without being first compiled. But in some languages, LISP for example, both sides tend to merge. The trend in computer languages nowadays is to become more and more dynamic, implying that the distinction between source code and runtime execution becomes less and less clear. One often sees programs written in dynamic languages generating and compiling parts of themselves during runtime.

That affects the way software should be tested.

Historically, software testing has focused on testing the source code. Back at the dawn of software development, when developer time was way cheaper than computer time, most testing was made by reviewing the sources by hand. Source code review is still a widespread way of testing code, but accepted as only one technique among many others. Some programmers test their programs by running them a number of times with various input. More serious software teams write automated regression tests, have some form of periodic test build or practice some variants of test driven development. Some have dedicated test teams. Some even use code coverage.

Code coverage is a technique that shows how much of a program's source code is really executed when the program runs. A coverage report typically looks like a copy of the program's source code with annotations (number of executions, boolean states, etc.) beside every line of code and every logical branch. Code coverage is a powerful tool when trying to write regression tests that cover as much as possible of a program's source. Still, code coverage, as the name states, mostly focuses on the source code.

So source code can be tested and verified in an almost provable way. But source code, except in the hypothetical case of a perfect program, almost never covers all possible situations that can occur during runtime. That leaves us with a huge and increasingly growing gap of untested program behavior: things that happens at runtime; code that re-compiles itself, internal caches that self-alter, IO errors, interrupts, hamsters gnawing at network cables, solar storms and other butterflies cruising on the other side of the earth.

What software testing should focus on is not so much the source code, but rather the states of a program. A running program can get into a huge number of different states. Some are very likely to happen, while most are extremely unlikely.

Some states are data induced: a given input leads to a given sequence of states in the program which leads to a given output. That is quite easy to test: just write a regression test that feeds your program some data and checks the program's reaction. Kind of.

But what about all the other states?

Let's take an example: let us assume you want to test what happens when your program tries to open a file while the operating system runs out of filehandles. That is tricky, but doable. In fact, your test will look very much like a classic exploit out of a black hat's toolbox. Now multiply this by the number of places in your code at which files are opened. It gets harder. And still, we are assuming in the first hand that you actually got the idea of testing that particular rather unlikely situation. And honestly, how often have you seen a developer team writing a test for filehandle exhaustion? I haven't. Yet, this specific situation has been used to gain local root privilege in a number of well documented exploits.

Which brings us to the following point: some states of a program can be really hard to test, but the hardest part is to know which states are relevant to test. In practice, a program can take such a large number of states that it is simply impossible to enumerate them all. This implies that you will never be able to reach 100% state coverage. Which in turn means that you will never be able to prove a program completely bug free.

Full state coverage is an utopia.
Contrary to code coverage, you can't even measure state coverage since the number of states is possibly infinite.

But you can still strive to increase state coverage. And you should.

Starting with improving code coverage sounds right, since code coverage covers a relevant subset of all possible program states. For the other states, it is up to you and the other developers in your team to judge of which states should be tested. It quickly becomes a matter of knowledge and experience: you need to have wrestled with some particular unlikely states in order to think of testing them in the future. Which brings us back to the long discussion of developer's craftsmanship...

Notice too that though dynamic languages make our life harder by increasing the number of states that should be tested, they also provide us with efficient tools to test them. Imagine testing a program's reaction to network failure. With a non-dynamic language you will end up running your program in a scripted virtual machine. With a dynamic language you will just redefine the network api at runtime. Much easier.

Open source as a way to improve code quality

July 11th, 2007

A couple of years ago, I started releasing code that I was developing during working hours as open source. That was the first step on a journey that helped me clear my mind on a number of issues surrounding open source.

There are still people around there who look down at open source with a shade of fear and disbelief in their gaze, so I wanted to share my modest experience on the matter.

I am a corporate developer. Mostly. My free time is filled enough with real life activities so that I can't afford the luxury nor have the drive to continue developing at home. Yet I love coding. So I do it at work instead. I mention those private details so you understand that I am not one of those many half-blinded linux-worshiping microsoft-hating slightly fanatic open source zealots.

Yet I do believe in open source.

My conviction has grown out of one simple fact: open sourcing code makes code better.

That simple statement is the result of a rather complex mechanism, blending psychology, group dynamic and efficient knowledge sharing. But through the years, I have had multiple occasions to verify this statement.

There are a number of things happening when you prepare yourself to release code as open source.

First, you are going to release code with your name on it. And internet being as it is, what is released does not disappear. It stays available for everyone to see. Everyone: people living 12000km under your feet, people you will never meet, but also people who will start communicating with you, sharing ideas with you, building a social and professional network around you. People you will meet at courses and conferences. People who will become your colleagues. Head hunters. Your boss. Recruiters who will consider hiring you, or not, depending on what you just released.

To make a long story short, releasing open source code is not just about writing code. It's an important social activity with significant consequences.

So you don't want to mess it.

This leads to what may be the single most important effect of releasing open source code: you are going to be extra careful with what you publish, trying to write elegant, well designed, well documented code. You will try to follow standards to avoid getting flamed publicly. You will write tests to feel more confident. You will react quickly to bug reports so your stats looks good. You will refactor your code when you discover ways of improving it.

And in the end, your code will stand a higher level of quality, for everyone to see.

Now, compare that with what usually happens with code that is released only internally, close source. If you write ugly corporate code, google won't know anything about it. In most cases, your fellow colleagues won't even notice: they are too busy trying to match their own deadlines to take time to look at your code. That is, until you resign, move on to a better paid job and live them with the joy of deciphering your production.

Only your closest colleagues use your code, so you will cut on documentation, since they just have to raise their head and ask you when they wonder about something. They probably wouldn't even read the documentation if there was one: asking is so much easier...

Your future may be influenced by the quality of what you write, but only as long as you work for the same employer. And according to statistics, you are most likely to get another job within a couple of years anyway.

So while the social aspects of code release tend to improve code quality in an open source context, it can almost have the opposite effect in close source environments.

But enough with the social aspect of open sourcing. There is one more consequence to open sourcing that has a significant effect on code quality: your code gets used.

At your office, you may (if you are lucky) have a team of testers dedicated to hunting bugs in your code. But most of the time, the only tools you can rely on to improve your code are your own regression tests, your profiler, your code coverage tools and your own imagination. Also, you will run your code on a couple of computers running one operating system. And that's it.

When you release code as open source, it gets tested, reviewed, commented, discussed. People will send you patches to fix bugs they found. People will run your code on their fridge running operating systems they have hacked together by themselves. Your code will be twisted, ported, executed down to the very last instruction in the darkest area of the less used branch condition from the most forgotten subroutine. And every time a bug is found and fixed, your code gets better.

I could go on a while like this, but I suppose you got the point anyway.

Now to the skeptics around us:

Yes, developing open source code does take more time and resources than close source one. That's because quality requires time. And therefore, this time is not wasted.

Yes, some of this time will go to development that your company won't benefit of. On the other hand, you will get free testing and sometimes even bugfixes and new features. It's a choice to make.

Yes, your pointy-haired boss will have difficulties understanding why you should start publishing code, as open source, since he himself never did that when writing fortran for Big Inc International 30 years ago. Not to mention the fact that he just read an article in 'computer weekly' stating that open-source development may cause fingertip cancer. Be patient and explain about the gain in quality. Or change job. Or wait until your pointy-haired boss is replaced by an other one and try again.

Testing mathematical calculations

April 20th, 2007

Over the past few months I have been working quite intensively at implementing calculation algorithms for my employer. It has mainly been about computing rates (internal rates of return, time and capital weighted return rates) on cashflows and returns on investments. The kind of things you want to know when monitoring a financial placement...

Mathematically speaking, those numbers are rather well defined. You start with a clean formula, usually directly taken from a study book on finance. Implementing those formulas may look like a piece of cake, but it is not.

So far, I have encountered 2 major issues with implementing that kind of calculations:

First: formulas have variables, or in developer terms, input data. This data has to be extracted usually from a database. So you have to define it clearly in terms of what is available from the database. That can be tricky. Take an internal rate of return. In order to approximate it, you have to define a cashflow, meaning a list of transactions where each transaction has a date and an amount. Now, how much time does a financial transaction take in the real world? It depends... Stocks can be purchased over a day. Fund shares however may require more than a day to purchase, if you use an intermediary for example. There is the glap: the formula requires a cashflow per day, but your business does not support it because transactions are spread over multiple days. You end up having to redefine your notion of transaction in order to use the given formula. That's where the ground gets slippery...

Second: formulas are tricky to test. That's probably the biggest problem for a developer. A mistake in the implementation of a mathematical formula won't cause runtime exceptions. It won't stop your code from compiling. It may even produce good looking numbers, and still be wrong.

Errors in implementations of formulas can be of 2 kind: definition errors (see the first point) or implementation errors (you wrote a - sign instead of a + somewhere. oops).

Implementation errors are quite easy to catch with a couple of technics:
  • Write a second, different implementation of the formula and compare the results. Better: write this other implementation in a specialised language, such as Mapple or Mathlab, or (aouch, it hurts) Excell. Even better, let someone else write this second implementation. Then take a wide population of cashflows/input data and compare the results of the 2 methods.

  • Write a third implementation. And a fourth. And a fifth. And go take some anti-depressive pills.

  • Define a number of heuristics to recognize calculation results that are unreasonable. For example, a rate of return should not be less than -100% or more than +1000%. Or a rate of return for a cashflow should not be positive when the return on investment on that cashflow is negative (and vice versa). Then write a small program that calculates your formula on a wide number of cashflows and looks for unreasonable results. Once you have have identified black sheeps, look at them closely and see if you caught a bug.

  • Make a continuity analysis. If your result is a rate of return calculated at a given rate, it should not differ much from the same rate calculated the day before or the day after, or a week later. So take your cashflow/input data and calculate your rate on it over a 3 years period and see how it behaves. Does it jump suddenly? Is it normal? How does it behave when switching months? And on new year's day?


And definition errors then? How do you catch them?

If you are unlucky, you can't use the first technic listed above: a second implementation of the formula will use the same wrongly defined data as your own implementation and do the same misstakes.

But you can still look for unreasonable results using the 2 other technics.

You should definitely talk with others, especially with people who are more qualified in finance or mathematics and can see your problem from a different perspective, bring you fresh ideas...

But the fact is, there seems to be no way of being sure...