Mathematical Foundations for Reusable Software

Joan Krone
Dept. of Mathematics and Computer Science
Denison University
Granville, OH 43023 USA
krone@denison.edu
Phone: +1 740 587 6484
Fax +1 740 587 5749
http://www.denison.edu/~krone

William F. Ogden
CIS Department
The Ohio State University
Columbus, OH 43210 USA
ogden@cis.ohio-state.edu
Phone +1 614 292 6004

Abstract

Assuming that software should be reused only when it has been certified as correct and assuming that a piece of generic software is an excellent candidate for reuse, we address the problem of writing and proving assertions that are expressed over any possible type that might be used in a given generic component. We note that semantics for such programs are relational, requiring us to do fixed point theory for relations on very large collections of items from various domains. Current mathematical set theory is not adequate for doing the kind of fixed point theory we need. Here we introduce some mathematics in the form of math units to be part of RESOLVE and show how those units support fixed point theory over general relations.

Keywords

Specification, verification, complete partial orders, set theory, fixed points.

Paper Category:
Emphasis: research

Introduction

Software engineers agree that building every new system from scratch is unrealistic, so they seek out existing software that appears to fit into the design of their new project, thereby reducing the number of new components to write. However, it often turns out that the existing components may not quite fit or may have previously undetected errors that show up only in this new application. These two problems, that of knowing exactly what a piece of software does and being sure that it is correct, can be solved only when software components are formally specified and verified [3, 4, 5].

Formal specification and verification require the use of some formal language for specifying what the software does and a proof system that can establish the correctness of it. We take an approach that uses mathematical assertions as part of the language for writing software, i.e., an assertive language which includes both executable constructs and specification assertions.

To complete this system for creating software, it is necessary to specify not only executable code in the software, but also the mathematical units used in proving correctness. Here we introduce syntax for specifying these units as components to be included in a given software project and give an example illustrating the language as it applies to a particular mathematical system as proof of concept. Ultimately, it will be necessary to build a supporting compiler that not only translates the executable code, but also includes a verifier that processes these math units and proves correctness of each component.

The Problem

One of the most important tools for creating reusable software is the capability to design and implement generic components. Instead of finding it necessary to write separate container classes depending on what the contents may be, we can write one class indicating that the contents may be of any type the user chooses, including types defined by the user. However, generic components are the most daunting aspects of reusable software from the point of view of proving correctness.

Certifying generic components as correct requires us to quantify over collections of mathematical objects so large that they are not sets. For example, suppose we have a generic stack component allowing us to place entries of any existing type or any type to be defined in the future as contents of the stack. Then it is necessary for us to make assertions of the form "for any entry type," followed by whatever statement we need to prove. Since users may introduce new types at will, our specification language and our proof system must allow us to express assertions involving any type that might be defined. It is also our intention to automate our proof system. Of course, this means that we need syntax that is reasonable for a program to process and we need semantics to support the system. We contend that traditional mathematical set theory does not support these needs.

Another aspect of the problem involved in formally specifying and verifying generic software is that traditional semantics, defined in terms of functions, are inadequate for RESOLVE, indeed for any assertive language that permits generic components. As shown in [2], we need relational semantics that must explain generic components written over general types.

Steps Toward Solution

We have described some of the challenges of formally dealing with generic software written for reuse. Here we address two aspects of those challenges. First, we present an example of a math unit as proof of concept that mathematics can be formalized in such a way as to make automated processing a possibility, i.e., we exhibit a syntax for expressing mathematical theories. We choose to show a math unit for ordering theory since it is necessary to use in describing the math unit in addressing semantic issues.

Secondly, we address part of the relational semantics challenge by introducing a mathematical theory to do fixed points over relations on general types (sets). This is done with CPO theory (complete partial ordering), a necessary basis for proving functional correctness of while loops.

Basic Ordering Theory

We present ordering theory as fundamental to a variety of areas in mathematics. In particular it is used for the development of well orderings, necessary for explaining complete partial orders we use in writing semantics.

In this short math unit, the only keywords used are Math_Unit, uses (to designate the list of other math theories needed in the one being defined. In the case of Basic_Ordering_Theory, Basic_Binary_Relation_Properties is listed as a math unit needed. That unit is also included here so that it is easy to see where some of the language used in the ordering theory came from. The keyword Def., as the abbreviation suggests, indicates a definition. As we will see in the next unit, other keywords are available including Theorem, corollary, and Inductive_Def., each taking the obvious meaning. When proofs for theorems are included, additional keywords such as Assumption and Goal appear. A math unit that includes theorems without proofs is called a Precis. It serve as an interface for those who want to use it for writing a new math unit, since it shows only what is needed by the user. The proofs are included in the math unit by the same name as the Precis.

This math unit uses another unit in which basic binary relation properties are introduced. Those properties include reflexivity, transitivity, symmetry, anti-symmetry, and others. The ordering unit can be written concisely with those properties available:

Complete Partial Orderings

We turn now to the most challenging problem mentioned earlier, that of introducing the mathematical foundations for doing denotational semantics for RESOLVE or any other assertive language that supports generics. Traditional approaches to denotational semantics assume that each construct within a given program segment computes a function and that the meaning of the entire program is the function found by performing functional composition on the functions associated with each segment. However, as was pointed out in [2], our programs require relational semantics in many cases.

Hence, in order to give meaning to our programs we must associate a relational meaning with each construct, and explain how relational composition works. For example, when writing denotational semantics for loops or recursive procedures, it will be necessary to consider chains of relations, rather than chains of functions. We then need to find least upper bounds on those chains, which means that we need to show their existence.

Making things even more complicated is the fact that our programs are written over general types, not just integers. This requires that our supporting set theory give us the expressiveness to quantify over such things as all stacks of any entry type, something that traditional set theory does not do.

In fact, establishing a theory of sets presents something of a paradox. On the one hand, the description of the nature of sets must be simple enough that we can believe that it really identifies what we want. On the other hand, that same description should imply the existencd of any arbitrarily strange set that we might ever conceive of. This isn’t quite attainable, so we’ll have to settle for getting at least the sets that people have needed so far, with the possibility of augmenting the description of the nature of sets if we discover that something we need is missing.

Another conundrum arises from the problem of what domain to use to describe set theory. If the foundational thesis is correct, then set theory, being a mathematical theory, should be described using set theory! Having the collection of all the sets you’d ever want to talk about be a member of itself, turns out to be just as bad an idea as you’d expect it to be. The way out of this problem is to make a distinction between the formal set theory that we’re trying to describe and an informal (bigger) set theory in our metalanguage.

We begin formulating our foundational theory of sets in terms of just two concepts. The first is the collection set , which we intend to denote all possible sets, from which all the objects to be discussed will come. The second is the two-argument predicate x epsilon y, which we intend to denote that the set x is a member of the set y. For our purposes we just need to know that this set theory exists and is consistent. For details, see [1].

In our basic set theory development in [1], the only object type under discussion is sets, so we don’t feel the need for a type system. As we develop the various specialized domains we need for ordinary mathematics within this set theory framework, having a type system proves to be a great convenience. This is because, first, types are familiar and natural from informal usage, second, they shorten our mathematical expressions, and third, they provide an easy mechanical check that frequently highlights areas in our mathematical developments where our thinking has gotten jumbled.

In our approach we add a type specification scheme to our basic predicate calculus system and we insist that all variables in our mathematical language be explicitly introduced by a construct such as a quantifier or a definition, which will be used to assign them a type. The types will just be sets from our underlying set collection and in fact, the types will be the general mechanism by which the underlying sets will be introduced into our mathematical language. Note that we adhere to this approach to typing in the ordering theory unit previously shown, as we do in all math units.

Assuming the set theory we need is available, we define complete partial order theory over set , first providing definitions and theorems that describe CPO’s in general. We then consider the kinds of relations that we need for our program semantics and introduce definitions and theorems that deal with those relations.

It is not feasible to explain here what each line of the CPO theory is about. For our purposes it suffices to point out that the first part of the unit introduces the usual concepts of upper bounds, least upper bounds, and chains found in any discussion of complete partial orders. However, here those concepts are presented in terms of our general set theory. Similarly, the ideas of monotonicity and convexity in our general setting are made rigorous.

Our goal is to define composition of relations in such a way that we can express the idea of chains of relations based on the intuitive notion of less defined, i.e., one relation is less defined than another. This gives us the expressive power to talk about monotonic chains of relations and then to pose questions concerning least upper bounds for those chains. Once we have this capability, we can define semantics for loops and recursive procedures in terms of these least upper bounds.

For example, to explain the meaning of a while loop, one can associate with each iteration a relation describing what has the loop has done up to that point. With each successive iteration, the loop so far is associated with a new relation "more defined" than the previous one, eventually leading to a least fixed point of the functional that gives meaning to the loop as a whole.

As the dots in the last section of the CPO theory indicate, additional definitions and theorems will be added to complete the unit that will serve as a basis for doing denotational semantics for our RESOLVE programs.

A Look to the Future

Developing denotational semantics for assertive programs proves to be a challenge in many ways. Not only is it necessary to give meaning to traditional programming constructs, but it is also necessary to provide meaning for such constructs as concepts, realizations, and math units. Sorting out how to do set theory and then how to use it for doing complete partial orders is a good beginning, but there is certainly more work to do to address all the semantic issues that exist in our approach to doing software.

References

Ogden, W.F., CIS 680 Course Packet, Spring, 2002.

Sitaraman, M, Weide, B. and Ogden, F., "Using Abstraction Relations to Verify Abstract Data Type Representations," IEEE Transactions on Software Engineering, March 1997, 157-170.

Sitaraman, M., and Weide, B.W., eds. Component-based software using RESOLVE. ACM Software Eng. Notes 19,4 (1994), 21-67.

Sitaraman, M., Atkinson, S., Kulczycki, G., Weide, B. W., Long, T. J., Bucci, P., Heym, W., Pike, S., and Hollingsworth, J. E., "Reasoning About Software-Component Behavior," Procs. Sixth Int. Conf. on Software Reuse, LNCS 1844, Springer-Verlag, 2000, 266-283.

Special issue on Real-Time Specification and Verification, IEEE Trans. on Software Engineeering, September 1992.