Joseph E. Hollingsworth
Computer Science Department
Indiana University Southeast
4201 Grant Line Road
New Albany, IN 47150 USA
jholly@ius.edu
Phone: +1 812 941 2425
Fax: +1 812 941 2637
URL: http://homepages.ius.edu/jholly
The "original" checked pointers component for C++ [Pike00] did a very good job of detecting many precondition violations for raw C++ pointers, including dereferencing a NULL or dangling pointer and deleting a dangling pointer. However, the syntax for invoking
new
anddelete
was slightly different than the standard syntax. These differences added extra confusion to student understanding of C++ pointers particularly when the student referenced commercial C++ books showing the standard syntax. This paper describes how we overloadednew
anddelete
in order to make client usage of the checked pointers component conform to standard C++ syntax. Also described is how we added reference counts to detect storage leaks.Keywords
checking components, checked pointers, esoteric C++ hacking
Paper Category: technical paper
Emphasis: research, education
Our first year using the checked C++ pointers component with undergraduate
students went very well except
when the slightly different syntax for invoking new
and delete
confused the students. This was bothersome for two reasons: first we do
not want to add any more confusion to the process of learning about pointers;
and second, we attempt to use a version of RESOLVE/C++ that actually looks and
feels like "real" C++. (We do this in hopes that students will
be tempted to continue to use RESOLVE/C++ in other courses and maybe even at
their place of work.) So we sat about trying to change the checking
wrapper component (named "Pointer") so that client usage of new
and delete
would be exactly identical to what it would be if the
Pointer component were not being used at all. Of course since Pointer is a
template, the actual declaration of the pointer types will never be the same as
if Pointer were not being used. But at least these differences are
isolated to the declarative part of the program.
Checked pointers are implemented by two separate components: Pointer, a template which is directly used by the client programmer, and Pointer_Map, a class which underlies Pointer, and is the workhorse for detecting pointer violations (e.g., deletion of dangling pointers). Here are some of the changes made to the original Pointer template and the Pointer_Map class:
operator new
operator delete
operator =
so that it can take a Pointer* and
Pointer&operator <<
operator =
so that it detects storage
leaksUltimately, client programs that use Pointer are
computing with Pointer objects everywhere and through esoteric (tricky) use of overloading Pointer's operations (e.g., new
, delete
,
operator ->
, etc.) it looks like the client program is computing
with real raw C++ pointers, but it is not. The really slick thing about it
is, after a quick change of the pointer declarations, i.e., eliminating the use
of Pointer in only the declarative part of the program, the client program really
does compute with
raw C++ pointers. Furthermore, none of Pointer relies on any other
RESOLVE/C++ components. Therefore, this checked pointers
component can be used by anybody writing C++ code who also wants to adhere to
standard C++ syntax.
During development of new client software, to reduce development effort and defects in the production version, the RESOLVE discipline recommends that the client programmer utilize checked versions of components used by the new client [Hollingsworth00]. If you take the view that raw pointers in C++ are a service providing component, then it follows that there should be a checked version of raw C++ pointers, and that this checked version should be used during the development phase of any client that relies on raw C++ pointers.
A requirement of a checking component is that it must have the exact same
syntactic interface as its corresponding unchecked component [Edwards98], so
that the unchecked version can be substituted for the checked version without
any change to the system's executable part (change to the system's declarative
part is permitted). The original checked Pointer template introduced in
[Pike00] met many of the requirements for being a checked version of raw C++
pointers with a couple of exceptions: calls to the checked new
and delete
were not identical to those used with raw C++ pointers and checking for the
creation of "garbage" (i.e., storage leaks) was not implemented.
These two exceptions proved to be bothersome when using the checked Pointer
template with students in an undergraduate data structures class.
Undergraduate students learning to use pointer types in C++ were confused by
the differences they saw in calls made to Pointer's new
and delete
as compared to standard C++ syntax (and no amount of explaining the difference
seemed to help). Furthermore, their programs often created garbage which
the Pointer template did not detect.
The problem became, how to create a version of the checked Pointer template that would detect the creation of garbage at run time and whose use by a client program would syntactically conform to that of standard raw C++ pointers.
The running example used throughout this
section is based on allocating and deallocating a "node" of a
linked list. This particular node holds an integer as its data value
and an address to the next node in the linked list.
The variable "head" holds the address to the first node in
the linked list. |
|
3.1 Storage Allocation
Part of the declaration of the Pointer template
and declarations used by a client are shown to the right. The first
step taken in order to make Pointer's new and delete
to have normal C++ pointer look and feel is to overload Pointer's new and delete
operators.
A close examination of As far as we can tell, it is not possible to change
|
|
What's Really Going On
Example:
head = new Pointer <Node_Struct>;
// in other words: head.operator = (new Pointer <Node_Struct>);
The implementation of new
ignores the size information it has
been passed in parameter size_t. new
is not allocating
storage for a Pointer <Node_Struct>
, it is allocating
storage for type T, i.e., the template parameter (in the running example type
T is Node_Struct). Even though new
is a member function of
Pointer, it acts like a static member function in that it has no access to data
members of objects declared from Pointer. Therefore the only thing new
can really do is return the address of the allocated storage. Before
returning the address, new
calls Pointer_Map's Add operation to add
the address to Pointer_Map's memory of allocated storage. So new
is really returning a pointer of type T*, which then gets handed off to
Pointer's operator =
(again assuming new
appears on
the rhs of assignment). The implementation of operator =
takes the address in rhs and stores it in head.rep; head.rep is declared at T*
(see private part of Pointer's declaration above). Before the assignment
of rhs to head.rep is made, testing is done to detect the creation of
garbage. operator =
has Pointer_Map decrement the reference
count of the address in head.rep, then checks to see if its count has gone to
zero. If its count has gone to zero, then the storage addressed by
head.rep is about to become garbage and operator =
signals a
run-time error. If its count is greater than zero, then that means the
storage is aliased and is not about to become garbage. In that case operator
=
has Pointer_Map increment the reference count of the storage addressed
by rhs and then does the assignment: rep = (T*)rhs;
3.2 Storage Deallocation
What's Really Going On
Example:
delete head;
Refer back to the declaration of delete
in Pointer
(above). You will notice that delete
expects to receive a
void*. But look at the example above, delete
is getting head,
and head is not a pointer variable, it is an object of type Pointer <Node_Struct>
.
This appeared to be a show stopper, and it was until we added the following type
conversion operation to the Pointer template:
operator Pointer* ();
When "delete head;
" gets compiled, the compiler
realizes that head is not the right type (its an object, not a pointer
variable), so it examins the Pointer template and discovers the type conversion
operation and uses it. So what really happens is (1) operator Pointer*
()
gets invoked on head (which just returns this
for head),
and then (2) delete
gets the this
pointer for head and
uses it to get to head.rep. At that point delete
checks to
see if Pointer_Map has head.rep in its memory, if not, head.rep is a dangling
reference and delete
signals a run-time error. If it is, it
has Pointer_Map remove it from its memory, and then deallocates the storage.
We considered having delete
decrement the reference count for
the storage addressed by head.rep, and if it is still greater than zero, signal
that dangling references are about to be created. But then we remembered
that dangling references are not a problem until they are dereferenced somewhere
downstream in the computation. It they are never dereferenced, then there
is no problem. If they are, then other parts of Pointer will detect the
illegal dereferening of dangling pointers.
3.3 The Rest of the Story for Detecting Storage Leaks
Implementing the Destructor
If the object head goes out of scope and the storage addressed by head.rep
has a reference count of one, then garbage is about to be created. To
detect this situation, ~Pointer () is implemented so that it asks Pointer_Map if
head.rep is still in its memory (i.e., it was not removed by delete
),
then it decrements its reference count and then checks to see if it is
zero. If it is zero, then that means the storage is not aliased and the
destructor signals a run-time error that there is a storage leak.
Implementing the Copy Constructor
But a destructor also gets invoked when value parameters go out of scope, therefore a copy constructor has to be implemented for template Pointer. When a client program passes a pointer variable it might contain an address to an allocates piece of storage, a dangling address or NULL. The copy constructor must first check to see if the address is in Pointer_Map's memory, if so, must have Pointer_Map increment its reference count. When the actual value parameter goes out of scope then the destructor will behave properly and not signal any errors.
Implementing Pointer& operator = (Pointer& rhs);
This operator gets called whenever one pointer is assigned to another. It must use the reference count of the object on the lhs to detect when garbage is created and it also must increment the reference count on the rhs.
3.4 The Client's Declarative Part
In order to painlessly switch from using the Pointer template to using real
raw C++ pointers (and back), some care needs to be taken when declaring the data
types.
To the right is an example using conditional compilation based on the symbol _DEBUG being defined. If _DEBUG is defined, then checked pointers are used, if not then regular raw C++ pointers are used. Below is a snippet of code using the typedef'ed symbols and what it looks like after being transformed by the preprocessor. |
|
Code as it appears in the program. |
Code as it appears after preprocessing. |
||
|
|
||
|
3.5 How new
Can Record the Line Number and Filename
To facilitate debugging, the original Pointer wrapper described in
[Pike00] also recorded the line number and filename where storage was
allocated. When the software developer asked for a report of memory usage,
this additional information was displayed with each allocated piece of
storage. When moving to the new Pointer template with new
and
delete
overloaded, it appeared that we were going to have to
forfeit this capability. But as is often the case with C++, there is
usually some workaround. As it turns out when overload operator new
,
additional parameters can be added to the syntactic interface, the C++ reference
manual refers to this as placement syntax. Below is the real header
for new used in Pointer. It uses default values for the line number and
filename for the cases where the user does not care to keep track of this
additional information.
void* operator new (size_t s, int line_num = __LINE__, char* file = __FILE__);
Of course this is not standard syntax for invoking new
, so once
more we have to rely on the preprocessor to help us out as is illustrated
below. When in debug mode, new
is redefined using the
preprocessor to include the additional parameters.
#ifdef _DEBUG |
3.6 Signaling Run-Time Errors by Invoking the Debugger
The Pointer checking component can be implemented a number of different ways for signaling the software developer when it detects a run-time error. Currently our department uses Microsoft's Visual C++ environment which comes with an interactive debugger. We have chosen to implement Pointer so that it invokes the debugger when a run-time error is encountered. Control is passed to the debugger and it can be used by the student (with minimal training) to find the exact line in the code where the run-time error was detected by Pointer. All errors are reported through an operation provided by Pointer_Map, it is shown below. The operations OutputDebugString and DebugBreak come from the Microsoft environment. OutputDebugString permits sending output to the debugger's output window, and DebugBreak transfers control to the debugger.
void __Pointer_Map::Report_Error (char* msg) |
Pointer_Map also provides Report_Allocation, an operation which displays the current memory allocation. Below is a sample displaying the memory allocation for a One_Way_List implemented using a Bookkeeping node at the head of the list and nodes similar to the Node used above. In this example, there are 5 lists with one list containing 9 items. The 5 bookkeeping nodes each have size 16 and were allocated by a call to new on line 203, and the 9 regular nodes have size 12 allocated by a call to new on line 20. The reference count for each node is 1 as is shown by: (rc:1).
Current Memory Allocation |
3.7 What's Not Handled
Currently Pointer is not interchangeable for all possible uses of pointer types in C++. Here is a list of things that Pointer cannot currently handle.n = x098233;
This is the assignment of a fixed memory address to a pointer
type. Problems with Pointer start to arise because it does not know
that this address references storage that was never allocated by new
in the first place.delete new Datatype;
delete
for deallocation. The timestamp for the allocated piece of storage
does not get set correctly before delete
gets a hold of the
address.
As noted elsewhere in this paper the "original" checked pointers component for C++ developed by [Pike00] did a very good job of detecting many precondition violations for raw C++ pointers, including dereferencing a NULL or dangling pointer and deleting a dangling pointer. We were driven to make changes to their original Pointer component because we wanted client usage to conform to regular C++ syntax. Along the way Bruce Weide suggested that we make a few other changes, e.g., adding detection of garbage creation through reference counting; thanks Bruce!
The introduction claims that this new Pointer template can be used by anybody writing C++ code who also wants to adhere to standard C++ syntax. Section 3.7 identifies at least two areas where that is not currently true, programs using pointer arithmetic and pointers as arrays. We urge others in the RESOLVE community who are still using C++ to adopt this newer version of the Pointer template and find ways to enhance its functionality. Finally, we believe that the new Pointer template should be publicized in forums that practicing C++ hackers read in an attempt to get widespread adoption.