Quirksand

SICP

SICP 2.5 and testing

June 28, 2015 10:25

This is an extra post in which I’ll discuss some more advanced testing procedures, which will be used in the next section. It requires one special type of function not handled in the book, but otherwise only uses concepts we’ve already covered. While this testing system is not as full-featured or as elegant as something like rackunit, it is a definite improvement over a simple list of check functions. Understanding how it works is entirely optional; the tests can be applied without much explanation merely by having the files present, subject to some modification depending on your Scheme implementation.

The first part of the test-functions file is a set of check functions. These work the same as those used in older exercises. The only difference is that the underlying procedures check for failure instead of a true state. These check functions take the opposite of that result, and thus work by ensuring that the false outcome does not hold. The reason for this will be clearer in a moment, but for now let’s compare a few of the failure and check functions.

(define (fails-equ? expected observed)
  (if (equ? expected observed)
      false
      (format "FAILURE: ~a is not equ? to ~a." observed expected)
      )
  )

(define (fails-=? expected observed)
  (if (= observed expected)
      false
      (format "Check = FAILED.  Expected ~a but was ~a." expected observed)
      )
  )

(define (fails-equal? expected observed)
  (if (equal? expected observed)
      false
      (format "Equality FAILED. Results were : ~a." observed)
      )
  )
  
(define (check-equal expected observed) (not (fails-equal? expected observed)))
(define (check-= expected observed) (not (fails-=? expected observed)))
(define (check-equ expected observed) (not (fails-equ? expected observed)))

These three functions check similar operations, and consequently they have a similar pattern to them. There is an if statement that uses the particular function for the check, and works with an ‘expected’ and ‘observed’ argument. They do all vary in the way results are reported. This is deliberate; you can decide what the merits and failings of each style is. (Note that the check functions don’t use this report, they only return a true or false; the reports will be used elsewhere).

Using true-false check functions is the same thing we’ve done before. They determine whether we have the correct answer or the wrong one. However, very often the problem in our code causes an error to occur, stopping execution immediately. That means it’s only possible to look at one problem at a time, and each issue must be fixed in sequence before continuing. That can make it tougher to figure out exactly what is going wrong, especially in a more complex system. To get around that problem, we need a different sort of function. This new type of function I’ve named using test to distinguish it from the check functions.

(define (test-equal observed expected . nameargs)
  (let ((testname (if (null? nameargs) "test-equal" (car nameargs)))
        )
  	  (exec-test fails-equal? observed (list expected) testname)
  	  )
  )
  

This is a test function for equality. The first line assigns testname from nameargs, or uses a default name of test-equal if nameargs is empty. This allows the test name to be optional. We then use exec-test to actually perform the test. The second argument, the expected value, is passed via a list, and we use the same failure-checking function for equal? that check-equal had.

To really understand the system, we need to know what that call to exec-test does. Moving on to that procedure, we see this :

(define (exec-test test-predicate expression-under-test other-args testname)
  (with-handler
    exc-display
    (lambda ()
      (let ((failure (apply test-predicate (cons (expression-under-test) other-args)))
            )
        (if (true? failure)
            (begin
              (if (string? failure)
                  (display failure)
                  (display "FAILED")
                  )
              (display "-")
              )
            (begin
              (display "pass...")
            )
            )
        )
      )
    )
   (display testname) ; optional
   (newline) 
  )
  
(define (exc-display exn)
  (display "ERROR: ")
  (display (exception-message exn))
  (display "-")
  )
  

There’s a special form here. It’s called with-handler, and it takes a handler function and an expression. What this does is execute a given expression within a sort of bubble. Inside this bubble, errors do not lead directly to the program aborting. Program control is instead passed to that handler function when an error occurs. Once the handler is done, the with-handler block can exit normally and the program can proceed instead of exiting.

This is generally known as exception handling (or ‘error handling’) and is a feature of many programming languages. When an error occurs, the normal flow of the program is skipped in some way. A special response is created, usually containing information about the error. This allows either the interpreter or some programmer-provided mechanism to decide what to do about the problem. All the errors you’ve encountered in Scheme are in fact exceptions, being handled with the ‘default’ handler that will just end execution entirely, after reporting where it stopped. While Scheme at the time SICP was written didn’t really have a formal specification for exceptions, most variations on the language have had something like this with-handler procedure (there’s a slight tweak for each implementation in the files). Without getting too far into the implementation details, we can go through the procedures as they’re used here as a demonstration.

We’ll start with how with-handler works. The first argument to with-handler is the handler, which needs to be a procedure to identify the type of exception that occurred and what to do with it. We have defined our handler to simply be exc-display, and that is what gets executed once an exception occurs inside our test block and we have something to handle. In our case we want to report the error and then continue from after the failed test. The function exception-message lets us get the information associated with the exception. That means the ‘exception’ is some sort of data structure that can give us a message about what happened, using this procedure as an interface. That information is then something we can use with display (in general, it will be a string).

With our handlers in place, we can get on with how to execute a test so it can be handled specially when errors occur. This is done by assigning to failure the result when we apply the test procedure using the arguments given. There’s also something important that is done with the ‘expression under test’ as it is passed to apply: it is executed as a procedure. Looking back at our test functions, we see that this is what ‘observed’ was, and therefore we know it must be a procedure. The reason for doing this is so that the observed value is only executed within the with-handlers block. If it were simply passed as an argument, the expression would be evaluated as an argument, prior to entering the bubble. We would not be able to use our own handler for it and go on to the next test, and the error would instead be handled by whatever was in place at the higher level. (You may note here that exception handlers can conceivably be nested.)

This special treatment to ensure execution inside the exception-handling bubble is only used on the ‘observed’ expression. That does make the observed argument unique in the tests. While this was done here merely as a matter of convenience, there could be some value in treating the tests in this fashion. It would enforce the condition that all computations that might result in an error are confined to the ‘observed’ section, not the ‘expected’ answer. However, it also makes testing slightly less flexible, as there are situations where it’s more natural and preferable to use computed values from the system under test for the expected results as well.

Whatever test-predicate is, it is supposed to return false if nothing went wrong, and may return anything else if failure occurs. This way, newly-written test functions can report failure in any format desired. Success is merely indicated with a ‘pass’ message. It is a convention in testing that success should be quiet, and report nothing or nearly nothing, since it’s only the failures that require attention. Tests typically are run repeatedly (and continuously if possible) and generating a lot of noisy ‘success’ messages can make it harder to find the parts that actually need fixing.

Exception handling also allows us to add another type of test: one to ensure that an expected error actually does occur. This can be quite useful, as there are exercises that require an error to happen on certain improper input.

Testing Example

To see how this works in action, here are some examples from the tests used for the ‘Generic Arithmetic’ system:

(test-equ (lambda () (mul n1 one)) n1
	"multiplicative identity works"
	)                                 
(test-equ (lambda () (mul n1 n2)) (mul n2 n1)   ; * computed 'expected' value *     
	"multiplication commutes"
    )                         

; From Scheme number tests
(test-equ (lambda () (add s2 s1)) 11.47) 
(test-false (lambda () (equ? (add s3 one) s3)))

We see that each test requires its first argument to be a procedure, and this is accomplished using a lambda expression with no arguments. (A similar approach was used when measuring execution time in Racket). The first two tests also provide the optional name, which is only displayed if a failure occurs. Note that if errors occur we cannot display the test name, since that isn’t provided as part of the exception data.

The second test shown here highlights the potential for problems when only one ‘observed’ value is allowed. If an error occurs when evaluating the ‘expected’ result of (mul n2 n1), the normal program flow will still be halted. One possible way around that is to use something like test-true? and put all computation inside the lambda ‘bubble’, similar to the way the final test shown here uses equ? inside a test-false statement.

Which format to use may also depend on the purpose of the test. What is important to test when checking for commutativity? Only that the two expressions yield identical results, not that either is the ‘correct’ value. Since neither is really being tested more than the other, using ‘observed’ and ‘expected’ in this manner is arguably inaccurate. On the other hand, adding a test-true wrapper is adding extra words to the expression and perhaps obfuscating the purpose of the test a bit. I prefer the more concise expression here, but feel free to modify the tests if you disagree. Note that in the future, we’ll also find a way around the need for these lambda ‘wrappers’ Special forms will be used and avoid the issue altogether.

The first file given below is just the test function framework. The second one contains tests used for the next set of exercises. Note that the ‘test-functions’ definition file will need modification, depending on your implementation (see the comments for what to do). It will also require the appropriate exception handler file.

Test Functions

Generic Arithmetic Tests