A More Seamless Checking Wrapper
for Raw C++ Pointers

Joseph E. Hollingsworth
Computer Science Department
Indiana University Southeast
4201 Grant Line Road
New Albany, IN  47150  USA
 
jholly@ius.edu
Phone: +1 812 941 2425
Fax: +1 812 941 2637
URL: http://homepages.ius.edu/jholly

 

Abstract

The "original" checked pointers component for C++ [Pike00] did a very good job of detecting many precondition violations for raw C++ pointers, including dereferencing a NULL or dangling pointer and deleting a dangling pointer.  However, the syntax for invoking new and delete was slightly different than the standard syntax.  These differences added extra confusion to student understanding of C++ pointers particularly when the student referenced commercial C++ books showing the standard syntax.  This paper describes how we overloaded new and delete in order to make client usage of the checked pointers component conform to standard C++ syntax.  Also described is how we added reference counts to detect storage leaks.

Keywords

checking components, checked pointers, esoteric C++ hacking

Paper Category: technical paper
Emphasis: research, education

1.  Introduction

Our first year using the checked C++ pointers component with undergraduate students went very well except when the slightly different syntax for invoking new and delete confused the students.  This was bothersome for two reasons: first we do not want to add any more confusion to the process of learning about pointers; and second, we attempt to use a version of RESOLVE/C++ that actually looks and feels like "real" C++.  (We do this in hopes that students will be tempted to continue to use RESOLVE/C++ in other courses and maybe even at their place of work.)  So we sat about trying to change the checking wrapper component (named "Pointer") so that client usage of new  and delete would be exactly identical to what it would be if the Pointer component were not being used at all.  Of course since Pointer is a template, the actual declaration of the pointer types will never be the same as if Pointer were not being used.  But at least these differences are isolated to the declarative part of the program.

Checked pointers are implemented by two separate components: Pointer, a template which is directly used by the client programmer, and Pointer_Map, a class which underlies Pointer, and is the workhorse for detecting pointer violations (e.g., deletion of dangling pointers).  Here are some of the changes made to the original Pointer template and the Pointer_Map class:

Ultimately, client programs that use Pointer are computing with Pointer objects everywhere and through esoteric (tricky) use of overloading Pointer's operations (e.g., new, delete, operator ->, etc.) it looks like the client program is computing with real raw C++ pointers, but it is not.  The really slick thing about it is, after a quick change of the pointer declarations, i.e., eliminating the use of Pointer in only the declarative part of the program, the client program really does compute with raw C++ pointers.  Furthermore, none of Pointer relies on any other RESOLVE/C++ components.  Therefore, this checked pointers component can be used by anybody writing C++ code who also wants to adhere to standard C++ syntax.

2.  The Problem

During development of new client software, to reduce development effort and defects in the production version, the RESOLVE discipline recommends that the client programmer utilize checked versions of components used by the new client [Hollingsworth00].  If you take the view that raw pointers in C++ are a service providing component, then it follows that there should be a checked version of raw C++ pointers, and that this checked version should be used during the development phase of any client that relies on raw C++ pointers.

A requirement of a checking component is that it must have the exact same syntactic interface as its corresponding unchecked component [Edwards98], so that the unchecked version can be substituted for the checked version without any change to the system's executable part (change to the system's declarative part is permitted).  The original checked Pointer template introduced in [Pike00] met many of the requirements for being a checked version of raw C++ pointers with a couple of exceptions: calls to the checked new and delete were not identical to those used with raw C++ pointers and checking for the creation of "garbage" (i.e., storage leaks) was not implemented.  These two exceptions proved to be bothersome when using the checked Pointer template with students in an undergraduate data structures class.

Undergraduate students learning to use pointer types in C++ were confused by the differences they saw in calls made to Pointer's new and delete as compared to standard C++ syntax (and no amount of explaining the difference seemed to help).  Furthermore, their programs often created garbage which the Pointer template did not detect.

The problem became, how to create a version of the checked Pointer template that would detect the creation of garbage at run time and whose use by a client program would syntactically conform to that of standard raw C++ pointers.

3.  Working the Problem

The running example used throughout this section is based on allocating and deallocating a "node" of a linked list.  This particular node holds an integer as its data value and an address to the next node in the linked list.

The variable "head" holds the address to the first node in the linked list.

The code to the right illustrates the standard syntax for allocating and deallocating a node using new and delete.  This is the look and feel that we are after when using the Pointer checking wrapper.

typedef Node_Struct* Node;

struct Node_Struct {
   int value;
   Node next;
};

Node head;

head = new Node_Struct;
...
delete head;

3.1 Storage Allocation

Part of the declaration of the Pointer template and declarations used by a client are shown to the right.  The first step taken in order to make Pointer's new and delete to have normal C++ pointer look and feel is to overload Pointer's new and delete operators.  

A close examination of new's return type shows that it is void*.  But the variable head does not hold an address to a Node.  In fact when using the Pointer wrapper, head is not a plain variable anymore, head is an object that holds a Pointer <Node_Struct>.  This poses a problem right away.  The client of Pointer will be computing with object's of type Pointer <Node_Struct>, not Pointer <Node_Struct>* which is what new returns.

As far as we can tell, it is not possible to change new's return type.  Which leads to the next step of overloading operator =.  Working under the assumption that when new is invoked it will be on the rhs of an assignment statement, we can overload  operator = so that it takes Pointer <Node_Struct>*.  Below are the headers for the overloaded operator =:

   Pointer& operator = (Pointer* rhs);
   Pointer& operator = (Pointer& rhs);

template <class T>
class Pointer
{
   ...
   void* operator new (size_t s);
   void operator delete (void *p, size_t s);
   ...
private:
   T* rep;
};

//-----------------------------
// Client declaration using Pointer

struct
Node_Struct;
typedef
Pointer <Node_Struct> Node;
struct Node_Struct {
   int value;
   Node next;
};

Node head;

 What's Really Going On

Example:
   head = new Pointer <Node_Struct>;
   // in other words: head.operator = (new
Pointer <Node_Struct>);

The implementation of new ignores the size information it has been passed in parameter size_t.  new is not allocating storage for a Pointer <Node_Struct>, it is allocating storage for type T, i.e., the template parameter (in the running example type T is Node_Struct).  Even though new is a member function of Pointer, it acts like a static member function in that it has no access to data members of objects declared from Pointer. Therefore the only thing new can really do is return the address of the allocated storage.  Before returning the address, new calls Pointer_Map's Add operation to add the address to Pointer_Map's memory of allocated storage.  So new is really returning a pointer of type T*, which then gets handed off to Pointer's operator = (again assuming new appears on the rhs of assignment).  The implementation of operator = takes the address in rhs and stores it in head.rep; head.rep is declared at T* (see private part of Pointer's declaration above).  Before the assignment of rhs to head.rep is made, testing is done to detect the creation of garbage.  operator = has Pointer_Map decrement the reference count of the address in head.rep, then checks to see if its count has gone to zero.  If its count has gone to zero, then the storage addressed by head.rep is about to become garbage and operator = signals a run-time error.  If its count is greater than zero, then that means the storage is aliased and is not about to become garbage.  In that case operator = has Pointer_Map increment the reference count of the storage addressed by rhs and then does the assignment: rep = (T*)rhs;

 

3.2 Storage Deallocation

 What's Really Going On

Example:
   delete head;

Refer back to the declaration of delete in Pointer (above).  You will notice that delete expects to receive a void*.  But look at the example above, delete is getting head, and head is not a pointer variable, it is an object of type Pointer <Node_Struct>.  This appeared to be a show stopper, and it was until we added the following type conversion operation to the Pointer template:
   operator Pointer* ();

When "delete head;" gets compiled, the compiler realizes that head is not the right type (its an object, not a pointer variable), so it examins the Pointer template and discovers the type conversion operation and uses it.  So what really happens is (1) operator Pointer* () gets invoked on head (which just returns this for head), and then (2) delete gets the this pointer for head and uses it to get to head.rep.  At that point delete checks to see if Pointer_Map has head.rep in its memory, if not, head.rep is a dangling reference and delete signals a run-time error.  If it is, it has Pointer_Map remove it from its memory, and then deallocates the storage.

We considered having delete decrement the reference count for the storage addressed by head.rep, and if it is still greater than zero, signal that dangling references are about to be created.  But then we remembered that dangling references are not a problem until they are dereferenced somewhere downstream in the computation.  It they are never dereferenced, then there is no problem.  If they are, then other parts of Pointer will detect the illegal dereferening of dangling pointers.

3.3 The Rest of the Story for Detecting Storage Leaks

Implementing the Destructor

If the object head goes out of scope and the storage addressed by head.rep has a reference count of one, then garbage is about to be created.  To detect this situation, ~Pointer () is implemented so that it asks Pointer_Map if head.rep is still in its memory (i.e., it was not removed by delete), then it decrements its reference count and then checks to see if it is zero.  If it is zero, then that means the storage is not aliased and the destructor signals a run-time error that there is a storage leak.

Implementing the Copy Constructor

But a destructor also gets invoked when value parameters go out of scope, therefore a copy constructor has to be implemented for template Pointer.  When a client program passes a pointer variable it might contain an address to an allocates piece of storage, a dangling address or NULL.  The copy constructor must first check to see if the address is in Pointer_Map's memory, if so, must have Pointer_Map increment its reference count.  When the actual value parameter goes out of scope then the destructor will behave properly and not signal any errors.

Implementing Pointer& operator = (Pointer& rhs);

This operator gets called whenever one pointer is assigned to another.  It must use the reference count of the object on the lhs to detect when garbage is created and it also must increment the reference count on the rhs.

3.4 The Client's Declarative Part

In order to painlessly switch from using the Pointer template to using real raw C++ pointers (and back), some care needs to be taken when declaring the data types.

To the right is an example using conditional compilation based on the symbol _DEBUG being defined.  If _DEBUG is defined, then checked pointers are used, if not then regular raw C++ pointers are used.

Below is a snippet of code using the typedef'ed symbols and what it looks like after being transformed by the preprocessor.

struct Node_Struct;

#ifdef
_DEBUG
typedef Pointer <Node_Struct> Node;
typedef Node Node_Rep;
#else
typedef
Node_Struct* Node;
typedef Node_Struct Node_Rep;
#endif

struct Node_Struct {
   int value;
   Node next;
};

Code as it appears in the program.

Code as it appears after preprocessing.

   Node head;

   head = new Node_Rep;
   ...
   head->value = 17;
   ...
   delete head;
// Debug mode, _DEBUG is defined
   Pointer <Node_Struct> head;

   head = new Pointer <Node_Struct>;
   ...
   head->value = 17;
   ...
   delete head;
// Release mode, _DEBUG is not defined
   Node_Struct* head;

   head = new Node_Struct;
   ...
   head->value = 17;
   ...
   delete head;

 

3.5 How new Can Record the Line Number and Filename 

To facilitate debugging, the original Pointer wrapper described in [Pike00] also recorded the line number and filename where storage was allocated.  When the software developer asked for a report of memory usage, this additional information was displayed with each allocated piece of storage.  When moving to the new Pointer template with new and delete overloaded, it appeared that we were going to have to forfeit this capability.  But as is often the case with C++, there is usually some workaround.  As it turns out when overload operator new, additional parameters can be added to the syntactic interface, the C++ reference manual refers to this as placement syntax.  Below is the real header for new used in Pointer.  It uses default values for the line number and filename for the cases where the user does not care to keep track of this additional information.
   void* operator new (size_t s, int line_num = __LINE__, char* file = __FILE__);

Of course this is not standard syntax for invoking new, so once more we have to rely on the preprocessor to help us out as is illustrated below.  When in debug mode, new is redefined using the preprocessor to include the additional parameters.  

#ifdef _DEBUG
#define new new (__LINE__, __FILE__)
#endif

   Node head;

   head = new Node_Rep;
   ...
   head->value = 17;
   ...
   delete head;

#undef new

3.6 Signaling Run-Time Errors by Invoking the Debugger

The Pointer checking component can be implemented a number of different ways for signaling the software developer when it detects a run-time error.  Currently our department uses Microsoft's Visual C++ environment which comes with an interactive debugger.  We have chosen to implement Pointer so that it invokes the debugger when a run-time error is encountered.  Control is passed to the debugger and it can be used by the student (with minimal training) to find the exact line in the code where the run-time error was detected by Pointer.  All errors are reported through an operation provided by Pointer_Map, it is shown below.  The operations OutputDebugString and DebugBreak come from the Microsoft environment.  OutputDebugString permits sending output to the debugger's output window, and DebugBreak transfers control to the debugger.

void __Pointer_Map::Report_Error (char* msg)
{
   OutputDebugString (msg);
   OutputDebugString ("\n");
   DebugBreak ();
} // Report_Error

Pointer_Map also provides Report_Allocation, an operation which displays the current memory allocation.  Below is a sample displaying the memory allocation for a One_Way_List implemented using a Bookkeeping node at the head of the list and nodes similar to the Node used above.  In this example, there are 5 lists with one list containing 9 items.  The 5 bookkeeping nodes each have size 16 and were allocated by a call to new on line 203, and the 9 regular nodes have size 12 allocated by a call to new on line 20.  The reference count for each node is 1 as is shown by: (rc:1).

Current Memory Allocation
=====================================================
Pointer report: Currently allocated memory locations:
-----------------------------------------------------
188 bytes currently allocated
Addr: 0x002F0988 (z:16) (rc:1) (Allocated in: owlist2.hc Line: 20)
Addr: 0x002F4038 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F3908 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F0A50 (z:16) (rc:1) (Allocated in: owlist2.hc Line: 20)
Addr: 0x002F41A8 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F0B18 (z:16) (rc:1) (Allocated in: owlist2.hc Line: 20)
Addr: 0x002F3A78 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F4318 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F0CA8 (z:16) (rc:1) (Allocated in: owlist2.hc Line: 20)
Addr: 0x002F3BE8 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F4488 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F3D58 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F3EC8 (z:12) (rc:1) (Allocated in: owlist2.hc Line: 203)
Addr: 0x002F2808 (z:16) (rc:1) (Allocated in: owlist2.hc Line: 20)
=====================================================

 

3.7 What's Not Handled

Currently Pointer is not interchangeable for all possible uses of pointer types in C++.  Here is a list of things that Pointer cannot currently handle.

 

4.  Related Work

As noted elsewhere in this paper the "original" checked pointers component for C++ developed by [Pike00] did a very good job of detecting many precondition violations for raw C++ pointers, including dereferencing a NULL or dangling pointer and deleting a dangling pointer.  We were driven to make changes to their original Pointer component because we wanted client usage to conform to regular C++ syntax.  Along the way Bruce Weide suggested that we make a few other changes, e.g., adding detection of garbage creation through reference counting; thanks Bruce!

5.  Conclusion

The introduction claims that this new Pointer template can be used by anybody writing C++ code who also wants to adhere to standard C++ syntax.  Section 3.7 identifies at least two areas where that is not currently true, programs using pointer arithmetic and pointers as arrays.  We urge others in the RESOLVE community who are still using C++ to adopt this newer version of the Pointer template and find ways to enhance its functionality.  Finally, we believe that the new Pointer template should be publicized in forums that practicing C++ hackers read in an attempt to get widespread adoption.

References

[Edwards98]
Edwards, S.H., Hollingsworth, J.H., Shakir, G., Sitaraman, M., Weide, B.W., “A Framework for Detecting Interface Violations in Component-Based Software” in Proceedings of the Fifth International Conference on Software Reuse, Victoria, B.C., Canada, June, 1998
[Pike00]
Pike, S., Weide, B., Hollingsworth, J., “Checkmate: Cornering C++ Dynamic Memory Errors With Checked Pointers”, in Proceedings of Special Interest Group on Computer Science Education (SIGCSE) 2000, Austin, TX, March, 2000.
[Hollingsworth00]
J. Hollingsworth, L. Blankenship and B. Weide, "Experience Report: Using RESOLVE/C++ for Commercial Software", in ACM’s Software Engineering Notes, Vol. 25, No. 6, pps. 11 – 19, November 2000.