Quirksand

SICP

SICP 2.5 and testing

June 28, 2015 10:25

This is an extra post in which I’ll discuss some more advanced testing procedures, that will be used in the next section. It requires one special type of function not handled in the book, but otherwise only uses concepts we’ve already covered. While this testing system is not as full-featured or as elegant as something like rackunit, it is a definite improvment over a simple list of check functions. Understanding how it works is entirely optional; the tests can be applied without much explanation merely by having the files present.

The first part of the test-functions file is a set of check functions. These work the same as those used in older exercises. The only difference is that the underlying procedures check for failure instead of a true state. These check functions take the opposite of that result, and thus work by ensuring that the false outcome does not hold. The reason for this will be clearer in a moment, but for now let’s compare a few of the failure and check functions.

(define (fails-equ? expected observed)
  (if (equ? expected observed)
      false
      (format "FAILURE: ~a is not equ? to ~a." observed expected)
      )
  )

(define (fails-=? expected observed)
  (if (= observed expected)
      false
      (format "Check = FAILED.  Expected ~a but was ~a." expected observed)
      )
  )

(define (fails-equal? expected observed)
  (if (equal? expected observed)
      false
      (format "Equality FAILED. Results were : ~a." observed)
      )
  )
  
(define (check-equal expected observed) (not (fails-equal? expected observed)))
(define (check-= expected observed) (not (fails-=? expected observed)))
(define (check-equ expected observed) (not (fails-equ? expected observed)))

These three functions check similar operations, and consequently they have a similar format. There is an if statement that uses the particular function for the check, and works with an ‘expected’ and ‘observed’ argument. They all vary in the way that non-failure is reported; you can decide what the merits and failings of each style is. (Note that the check functions don’t use this report, they only return a true or false; the reports will be used elsewhere).

Using true-false check functions is the same thing we’ve done before. They determine whether we have the correct answer or the wrong one. However, very often the problem in our code causes an error to occur, stopping execution immediately. That means it’s only possible to look at one problem at a time, and each issue must be fixed in sequence before continuing. That can make it tougher to figure out exactly what is going wrong, especially in a more complex system. To get around that problem, we need a different sort of function. This new type of function I’ve named using test to distinguish them from the check functions.

(define (test-equal observed expected . nameargs)
  (let ((testname (if (null? nameargs) "test-equal" (car nameargs)))
        )
  	  (exec-test fails-equal? observed (list expected) testname)
  	  )
  )
  

This is a test function for equality. The first line assigns testname from nameargs, or uses a default name of test-equal if nameargs is empty. This allows the test name to be optional. We then call a function exec-test to actually perform the test. The second argument, the expected value, is passed via a list, and we use the same failure-checking function for equal? that check-equal had.

To really understand the system, we need to know what that call to exec-test does. Moving on to that procedure, we see this:

(define (exec-test test-procedure expression-under-test other-args testname)
  (with-handlers 
      ([catch-all exc-display])  ; list of handlers
      (let ((failure (apply test-procedure (cons (expression-under-test) other-args)))
            )
        (if (true? failure)
            (begin
              (display testname)
              (display ": ")
              (if (string? failure)
                  (display failure)
                  (display "FAILED")
                  )
              )
            (display "pass")
            )
        (newline)
        )
      )
    )
	
(define (catch-all exc)
  (exn:fail? exc)
  )

(define (exc-display exc)
  (display "ERROR: ")
  (display (exn-message exc))
  (newline)
  )
  

The function with-handlers is a special form that takes a list of handler functions and an expression. What it does is execute a given expression within a sort of bubble. Inside this bubble, errors do not lead directly to the program exiting. Program control is instead passed to the handler functions when an error occurs in that expression. Once the handler is done, the with-handlers block can exit normally and the program can proceed.

This is generally known as exception handling (or ‘error handling’) and is a feature of many programming languages. When an error occurs, an object known as an exception is created, which may contain some information about what went wrong. This allows either the interpreter or some other mechanism to decide what to do with it. All the errors you’ve encountered if you use Racket are in fact exceptions, being handled with the ‘default’ handler. While Scheme has never had a formal implementation for exceptions, most variations on the language have done something like Racket’s with-handlers procedure. Without getting into the details, we can see how it works by using these procedures as examples.

The first argument to with-handlers is the list of handler pairs, which are the procedures used to identify and handle the exceptions. (The use of square brackets here is Racket-specific and not worth going into; it’s effectively still a pair). The first element of the pair is an identification procedure that will be given the exception, and return true if we are willing to handle it. The second element is the actual procedure that will be used to handle it. This approach allows different kinds of exception handlers to be processed by different handlers.

Our handler id function catch-all is set up to handle every kind of exception, aside from explicit breaks (so that if an infinite loop occurs, you can still terminate execution). Then we have the actual handler exc-display, which is what gets executed once an exception occurs that we are willing to handle. In our case we want to report the error for our test and continue. The built-in function exn-message lets us get the message associated with the exception, and that’s what we can output to indicate an error in the test.

With our handlers in place, we can get on with the actual execution of a test. This is done by assigning to failure the result when we apply the test procedure using the arguments given. There’s also something special done with the ‘expression under test’ as it is passed to apply: it is executed as a procedure. Looking back at our test functions, we see that this is what ‘observed’ was, and therefore we know it must be a procedure. The reason for doing this is so that the observed value is computed within the with-handlers block. If it were simply passed as an argument, the expression would be evaluated as an argument, outside of the bubble, and we would not gain the benefits of error handling.

This special treatment to ensure execution inside the exception-handling bubble is only used on the one expression. That does make the observed argument unique in the tests. While this was done here merely as a matter of convenience, there could be some value in treating the tests in this fashion. It would enforce the condition that all computations that might result in error are confined to the ‘observed’ section, not the ‘expected’ answer. However, it also makes testing slightly less flexible, as there are situations where it’s more natural and preferable to use computed values for the expected results.

Whatever test-predicate is, it is supposed to return false if nothing went wrong, and may return anything else if failure occurs. This allows for newly-written test functions to report failure in any format desired. Success is merely indicated with a ‘pass’ message. It’s a convention in testing that success should be quiet, and report nothing or nearly nothing, since it’s only the failures that require attention. Tests typically are run repeatedly (even continuously if possible) and generating a lot of ‘success’ messages can make it harder to find the parts that actually need fixing.

Exception handling also allows us to add another type of test: one to ensure that an expected error actually does occur. This can be quite useful, as there are occasionally exercises that require an error on certain improper input.

Testing Example

To see how this works in action, here are some examples from the tests used for the ‘Generic Arithmetic’ system:

(test-equ (lambda () (mul n1 one)) n1
	"multiplicative identity works"
	)                                 
(test-equ (lambda () (mul n1 n2)) (mul n2 n1)   ; * computed 'expected' value *     
	"multiplication commutes"
    )                         

; From Scheme number tests
(test-equ (lambda () (add s2 s1)) 11.47) 
(test-false (lambda () (equ? (add s3 one) s3)))

We see that each test requires its first argument to be a procedure, and this is accomplished using a lambda expression with no arguments. (A similar approach was used when measuring execution time in Racket). The first two tests also provide the optional name, which is only displayed if a failure occurs. Note that if errors occur we cannot display the test name, since that isn’t provided as part of the exception data.

The second test shown here highlights the potential for problems when only one ‘expected’ value is allowed. If an error occurs in (mul n2 n1), the program execution will be halted. A possible way around that is to make it similar to the format in the last test, which uses test-false and only requires one argument.

What’s important to test is that the two expressions yield identical results. Neither is really being tested more than the other, so using ‘observed’ and ‘expected’ in this manner is arguably inaccurate. On the other hand, adding a test-true wrapper is like adding extra words to the expression, making it slightly harder to see what’s happening. I prefer the expression as given, since it’s more concise. Feel free to modify it if you disagree.

The first file given below is just the test function framework. The second one contains tests used for the next set of exercises. If you are not using Racket, it should be possible to modify the test-functions file and leave the remaining tests as they are (as long as your version of Scheme can handle exceptions).

Test Functions

Generic Arithmetic Tests_