Testing in Undergraduate Computer Science: Test Cases Need Formal Specifications

Allen Parrish
Department of Computer Science
The University of Alabama
Tuscaloosa, AL 35487-0290

parrish@cs.ua.edu
Phone: 205-348-3749
Fax: 205-348-0219
URL: http://cs.ua.edu/~parrish

Abstract

Software testing has typically received scant attention in most computer science undergraduate curricula.  Even when testing is taught, we have observed little success in getting students to appreciate the motivation for expending effort in validating their programs.  There are a number of reasons for this lack of success.  One of the problems is that students don't necessarily see the bigger picture in the relatively simple typical introductory programming context.  We believe that one approach to solving this problem is to concede the motivation problem and address issues of form first.  Formal notations eliminate a lot of ambiguity often associated with introductory testing discussions that are not typically well-motivated to begin with.  In this paper, we discuss some simple approaches to writing formal specifications for test cases.

Keywords

Software testing, formal specifications, computer science education

Paper Category: Position Paper
Emphasis: Education

1. Introduction

Most introductory students don't do well with testing because they fail to appreciate the real need for it.  This is not helped by the lack of attention paid to testing in most introductory textbooks.   While there usually is a short discussion devoted to testing, it is typically vague and ill-motivated.  It is hard for students to appreciate the need for testing boundary or extreme conditions, for example, until they have seen a number of situations where defects have arisen that are consequences of such conditions.

We claim that this lack of motivation is truly inherent.  Rather than addressing it directly, a better approach might be to teach students the "form" of testing in an unambiguous way, without trying to emphasize the substance.  The idea is that if students understand the form first, the substance of actual test design can follow on top of a solid foundation.

As such, we propose the idea of formally specifying test cases using (mostly) simple notations.  An important idea here is to capture the concept that test cases are (in their simplest form) discrete events.  Students should understand that testing consist of N runs of the program, where each run involves executing a test case.  Thus, a test set is a set of N test cases.  Each test case is either revealing or unrevealing of a defect.  A revealing test case should be followed by some kind of corrective action that would then be followed by re-running the entire test set.  The process concludes when all test cases in the test set are unrevealing.

At the introductory level, there is really two types of testing: "value testing," which involve providing actual input values to programs and "function testing," which involve executing actual sequences of program functions (frequently method sequences) in various ways to test classes and other reusable components.  We find that virtually all discussion of testing in introductory textbooks is with regard to value testing.  Function testing is largely ignored, which we believe is a product of the lack of clarity in the software engineering literature regarding how to discuss testing classes and reusable components in general.

We propose formalizing the notion of a "value test case" as an input/output vector and a "function test case" as either a function sequence or pair of sequences.  These two ideas are discussed individually in Sections 2 and 3 below.  Section 4 contains a brief summary and conclusion.

2. Formalizing Value Test Cases

As an example of a typical introductory textbook treatment, we consider the testing discussion from [3].  The principal discussion in [3] consists of a case study of a simple application that computes "weekly pay" from some inputs (employee type (full-time or part-time), hourly rate, hours worked).  The discussion in [3] then seeks to motivate the idea of equivalence classes based on the fact that overtime is treated differently for full-time employees; thus, 0 to 40 hours constitutes one equivalence class, while over 40 hours constitutes a second equivalence class.  It is then argued that test data for this program should test the program's behavior under boundary conditions and extreme conditions relative to these equivalence classes.  It concludes with the following characterization of good test data for this program:

We feel that this discussion would benefit from a simple formal characterization of the testing process along the lines described above.  In particular, this particular program should be viewed as a function from an input vector (employee type, hourly rate, hours worked) to an output vector (weekly pay).  Each test case is an (input vector, output vector) pair.  The characterization of "good test data" could then be represented as a set of test cases, as follows:

Representing the test data as a set of nine discrete test cases would reinforce the idea that the program must be run nine separate times (if the idea is reinforced that each test case corresponds to an execution of this program).  Test cases using this formalism each explicitly includes the correct output, so the idea of running the program and each time checking the output against the test case is explicitly factored into this model.  Because the model provides clarity to the notion of a test case as a discrete entity, it thereby provides clarity to the idea of a test case being "revealing" or "unrevealing" of a defect.  The idea of repeating test cases until no test case reveals a defect can also be unambiguously explained in this model.  None of these concepts are discussed in [3].

While the idea of presenting discrete test cases can be adopted without using formal notation, the notation is important because of the complexity of input and output vectors even in trivial programs.  We once gave an assignment to compute sales commissions early in an introductory programming course.  The program was to iteratively read sales amounts and generate commissions based on some simple rules.  This was to be done repeatedly until the appearance of a sentinel value, at which point the program was to print some simple totals (total commissions, average commission, number of salespeople). 

We instructed the students to develop a test plan consisting of test cases of the form ({<s1, c1>, <s2, c2>,...<sn, cn>}, {t, a, n}), where <si, ci> was the input sales and output commission for salesperson i, and {t, a, n} were the three totals values.  Each such tuple ({<s1, c1>, <s2, c2>,...<sn, cn>}, {t, a, n}) was a test case.  This makes explicit the idea that one run of the program was insufficient even if it tested for a lot of different sales amount values, because it did not consider the dimension of how many salespeople to compute in a given run.  Formalizing the test cases forced the students to realize that there were two dimensions in the test design for this problem.  It seems unlikely that an informal discussion would have ever made this point convincingly.  But once the students realized that they had to design more than one tuple of the form above, they realized that the testing process had these multiple dimensions.

In short, we feel that specifying the formal tuple for each test case causes the students to think more concretely about testing than simply having an abstract discussion about equivalence classes and boundary conditions.  Of course, the formalism itself does not provide any semantic content about how to choose test cases.  However, boundary conditions can be expressed more concisely and unambiguously in text using the formalism.

One current limitation is that this concept seems to only fit well with traditional imperative programs that run to termination.  A somewhat different notational paradigm may be more appropriate for capturing inputs for event-driven programs.   This could be a potentially interesting topic for discussion.

3. Testing Classes

Introductory textbooks seem to completely ignore the problem of unit testing.  Such an omission might make sense in traditional procedural programming where programs do not get more complex than a few procedures.  But with object-oriented programming, classes are the principal program unit, and classes are typically complex enough that unit testing is necessary.  This is even more important in cases where a class is produced as a reusable component that is independent of any particular application.

We deal with this problem by specifying class test cases as method sequences.  Moreover, we define two types of test cases: state inspection test cases and state comparison test cases.  For simplicity, assume that any class has three types of methods in its interface: constructors, modifiers and observers, with the obvious definition of each.  The state inspection test cases are of the form CM*O+ (i.e., constructor, followed by zero or more modifiers, followed by one or more observers).  These sequences simply produce the values returned by observer methods, which are capable of being observed (often built-in types).  State comparison test cases are of the form (CM*, CM*, comp) or (CM*O, CM*O, comp), where "comp" is a comparison operator.  The idea with state comparison test cases is to use the comparison operator to return the single results produced by the two method sequences; such results are either objects of the class under test (produced by the sequence of modifier methods in CM*), or the results returned by an observer method (i.e., by method O in CM*O).

By giving students these templates, they have a syntactic notion of a test case, just as with the input/output vectors in the previous section.  Students then are asked to write down each test case as a pair (method sequences, result).  Sequences are then written using standard "dot" notation (e.g., Stack().push(10).push(20).top).  So a state inspection test case might be written (Stack().push(10).push(20).top, 20), while a state comparison test case might be written ((Stack().push(10), Stack().push(10).push(20).pop(), equal), true).

The JUnit [1,2] unit testing framework for Java has also helped to address this problem, in that it provides a way to easily build and execute unit tests on Java classes.  Each test case is captured as a method within this framework, which contributes to a reasonable formal framework to capture tests as discrete cases.  JUnit has been used in the context of undergraduate computer science education [4], and we feel that it holds great promise in this regard.

4. Conclusion

Little progress has been made over the past 20 years in teaching software testing concepts early in the undergraduate computer science curriculum.  Introductory computer science textbooks do not seem to have changed significantly in this regard.  We feel that one of the problems is the vagueness with which testing concepts are typically discussed.   In particular, the notion of a test case as a discrete entity corresponding to a run of the program or an execution of a sequence of methods is not emphasized to students.  We feel that one solution to this problem is the use of a formal framework as a basis for pedagogical discussion, much in the way that research discussions are aided by the use of such frameworks [5].  Such frameworks make the concepts associated with testing more precise, and do not typically require the use of mathematics more complex than basic sequences, sets, functions and tuples.

References

[Beck97] Beck, K. and E. Gamma, JUnit Open-Source Testing Framework, www.junit.org.

[Beck98] Beck, K. and  E. Gamma. “Test Infected: Programmers Love Writing Tests.” Java Report, Volume 3, Number 7, 1998.

[Lambert02] Lambert, K. and M. Osborne, Java: A Framework for Programming and Problem Solving, Brooks/Cole, 2002.

[Noonan02] Noonan, R. and R. Prosl, "Unit Testing Frameworks," Proceedings of the 33rd SIGCSE Technical Symposium on Computer Science Education, pp. 232-236.

[Parrish93] Parrish, A. and S. Zweben, "Clarifying Some Fundamental Properties in Software Testing," IEEE Transactions on Software Engineering, vol. 19, no. 7, July, 1993, pp. 742-746.