Quirksand

SICP

SICP 2.4.2 and more on testing

May 3, 2015 11:09

The subject of this section is the creation of a looser abstract system for complex numbers that can accomodate both implementations previously used. While the addition of tags requires a bit more computation time, it allows for greater flexibility. We’ll be getting a lot of use out of type tagging, as it’s an easy approach to use in Scheme — determining type is just a matter of taking the car of a list, no matter what the remainder of the structure is.

With regard to testing, we find the same thing as noted for the math operations: no changes to them are required. The tests continue to use the same interface as before, and this has no effect on how the results are calculated or compared.

However, something that has changed, compared to the previous section, is that the error that occurred with zero values has apparently gone away. The error hasn’t really been fixed, though. This can be confirmed by changing the test to use make-from-real-imag when setting up the ‘zero’ vector. What’s been introduced is rather subtle with respect to number systems: the zero value created by (make-from-real-imag 0 0) does not behave the same as a zero value created from make-from-mag-angle, even though they satisfy our definition of equality.

A very rational approach is to say that since there are multiple methods of constructing apparently identical values, we need to have tests that cover them separately, regardless of the underlying implementation. We can create another set of tests to do this, or even a function that generates tests based on a template. However, even with just two constructors, the number of tests required rapidly grows, and in a larger system might grow to an unmanageable size. While there are many circumstances in which it is critical to get full coverage of all cases, it is not always feasible or practical to do so. The best thing is to write the best tests you can, and revisit them and modify them to ensure they aren’t completely missing the important cases.

I don’t want to give the impression that testing isn’t worth it. It will still catch an awful lot of the obvious bugs that can crop up even in simple systems. Unless you really have proven that your tests cover every possibility, they are most likely incomplete and imperfect. And as we just saw, even changes that appear to ‘fix’ bugs can simply hide them because of flaws in the test. Programmatic testing is almost never a guarantee of working code; it’s a tool for narrowing down the flaws in it. As more flaws are revealed, tests can be added to help avoid a recurrence of the same problem.

As another demonstration of what testing can reveal, I created an additional complex number implementation. This one is very simple, but it is also very wrong by most standards: its constructors completely ignore the values given to them and store nonsense. Even that does not matter, as the accessors always return the same values. Effectively, there is only a single number in this system.

So what happens when we run the tests? The test suite that checks the interface reports failures, as expected. However, at the level of our math operators, all the tests pass just fine! Apparently, this single-number system see here for more satisfies whatever properties we were checking for in the calculations. We might want to then add another check to the calculation tests for this system (ensuring that ‘zero’ is not equal to ‘one’). The thing to remember is that there can be hidden assumptions when dealing with an abstraction, and the tests need to be sure they can find them. It’s not an easy matter.

Another approach to testing this bad implementation might be to alter how we check the computation results. In most cases, the reason they do not fail is that the ‘expected’ values are all flawed, since they are using the same abstract interface as the system. We could change the tests to look directly at the value returned, or create our own correct answer directly, like so:

(check-complex-= (add-complex zero one) '(rectangular 1 . 0))

Then we can’t be fooled by errors in the implementation.

Hopefully it is clear why this is not the best approach to take here. We’ve now tied our tests very closely to one particular implementation. Such tests could not be re-used for the previous section, for instance. Make the change to remove tags, and the tests have to be rewritten.

While sometimes there are circumstances where peeking behind the abstraction does end up being necessary, or at least a lot more feasible, that tendency should be avoided. In the interests of requiring less rewriting, it’s better to adhere to the interface. It also becomes easier to modify the underlying system if it’s not necessary to expend extra effort to change the tests yet again; otherwise we risk either abandoning sufficient testing or putting off potentially valuable modifications.

The next section won’t strictly require the RackUnit tests. It returns to the style of testing/verification we’ve been using so far. However, testing using some sort of framework is something we’ll return to in the future. As our knowledge of constructing programs grows, so too will the complexity of the programs, and we’ll want something that allows us to more easily test a portion or all of a system even when it is not fully working.