C++ object's reference counting

Last update: 13.01.2009

Passing objects to methods by pointer or reference is very handy if you want to boost performance by avoiding invoking the copy constructor. Sometimes it's the only way to pass an object because it doesn't have a copy constructor or assignment operator defined. An example would be an object that has an open operating system resource that can't be safely shared.

Let's say that you are creating a class that does some low-level operation on a file. It contains a file descriptor as a field:

  1. class LowLevelFile
  2. {
  3. int fd
  4.  
  5. // [...]
  6. };

The Problem

Writing a copy constructor and assign operator is an easy task using dup(). The problem is that copying this object allocates a new file descriptor and the number of file descriptors is limited. Duplicated file descriptors also share status of the opened file (like current offset). If you change this state in one copy of an object, this also affects other copies. This can lead to problems that are hard to debug, looking at the code that is using the LowLevelFile class code everything seems to be OK when you see modifications of a copy of an object, but you don't see that in fact all copies of an object share the same resource.

In some situations the best way to assume that everywhere object is passed as a pointer. But then you don't when when to destroy it. You passed the pointer to some method of some class, but without looking at the documentation or source of that class you can't be sure if that method doesn't save this pointer in a field of that class or in a some STL collection object. So is it used now somewhere in the program? This problem arises even with small projects (below 10000 lines), even when you are the only author. Since there is no way to use just one assumption everywhere in real life (like pass everything as a pointer), there is always place to introduce leaks or double delete.

Another problem is using STL containers: adding something to a container invokes copy constructor even when you think the compiler could optimize that like in situation:

  1. class Test
  2. {
  3. // [...]
  4. };
  5.  
  6. vector< Test > v;
  7. v.push_back (Test());

An object of class Test is created using it's constructor and then a copy of it is made using the copy constructor. You could avoid that by inserting a pointer to the object in the collection, but problems mentioned in the previous paragraph will arise.

Solution in Java


Languages like Java overcome this kinds of problems by using references to objects when passing them to a method or copying them. Copying an object must be done explicitly. In Java, if you write:

  1. File a = new File ("/etc/passwd");
  2. findUser (a);

The variable a is a reference to an object, like a reference in C++ and this is the only way you can refer to an object. The second line passes this reference to a method. Again, this is the only method to pass objects; by reference. This is a simplification in contrast of C++ where you can pass an object by value, by reference and by pointer. It has this advantage that when you are not explicitly copying an object you always know that there is one File object and the whole code operates on that object.

Moreover, in Java you don't delete the object. When the Java Virtual Machine finds out that the object is not used (not referenced by anything) it's scheduled to be deleted by the (in)famous garbage collector. I will be not discussing if the GC is bad or good, you can find many information about this on the Web. I just want to point that this resolves all of our problems - you don't need to track when you have a copy of an object and where just a pointer/reference to the original one.

Solution in C++


Thirst solution people will think of is using auto_ptr<> template, but it's not intended to be passed to a method or put into a container because it does no reference counting, it just deletes the object when it's destructor is called.

We must write something by hand.

Class with a reference counter

Here is a class that has a field refCount that acts as a counter of references to the object. The class is intended to be used as a base class for your classes that need to do reference counting. When you create an object the counter is initialized with value 1, that is one reference exists to this object. If you ever pass a pointer to that object somewhere, the counter must be incremented and when a method/class no longer uses it it must be decremented.

  1. class Referenced
  2. {
  3. int refCount;
  4.  
  5. public:
  6. Referenced () throw () :
  7. refCount (1)
  8. {
  9. }
  10.  
  11. void incRefCount () throw ()
  12. {
  13. refCount++;
  14. }
  15.  
  16. int decRefCount () throw ()
  17. {
  18. return --refCount;
  19. }
  20. };

When the counter goes to 0, the object can be removed (nobody is using it). Two methods are used to maintain the counter:

  • incRefCount() - Increments the reference counter by one indicating someone starts using whe object.
  • decRefCount() - Decrements the reference counter by one indicating that someone stops using the object. The result is returned. If the method returns 0, the object is not used and should be deleted.

The reference


Maintaining the reference counter by hand is only a little help: you must remember to increase/decrease the counter properly and STL containers don't do it for you. So we need a class to automate this task. We will write it to behave similarly to a pointer. We will call it ReferenctToObj, but it's more like a pointer (can have a NULL value and you must use the -> operator to access the object it points to.

  1. template< class T >
  2. class ReferenceToObj
  3. {
  4. T *obj;
  5.  
  6. void decRef () throw ()
  7. {
  8. if (obj && obj->decRefCount() == 0) {
  9. delete obj;
  10. obj = NULL;
  11. }
  12. }
  13.  
  14. public:
  15. class NullDereference
  16. {
  17. };
  18.  
  19. ReferenceToObj (T *obj = NULL) throw () :
  20. obj (obj)
  21. {
  22. }
  23.  
  24. ReferenceToObj (ReferenceToObj< T > &orig)
  25. {
  26. obj = orig.obj;
  27. if (obj)
  28. obj->incRefCount ();
  29. }
  30.  
  31. T &operator* () throw (NullDereference)
  32. {
  33. if (!obj)
  34. throw NullDereference ();
  35. return *obj;
  36. }
  37.  
  38. T *operator-> () throw (NullDereference)
  39. {
  40. if (!obj)
  41. throw NullDereference ();
  42. return obj;
  43. }
  44.  
  45. bool operator== (const ReferenceToObj< T >&right) const
  46. {
  47. return obj == right.obj;
  48. }
  49.  
  50. ReferenceToObj< T > &operator= (ReferenceToObj< T > &right)
  51. {
  52. if (this == &right)
  53. return *this;
  54.  
  55. if (right.obj)
  56. right.obj->incRefCount ();
  57.  
  58. decRef ();
  59.  
  60. obj = right.obj;
  61.  
  62. return *this;
  63. }
  64.  
  65. operator bool ()
  66. {
  67. return obj != NULL;
  68. }
  69.  
  70. ~ReferenceToObj< T > ()
  71. {
  72. decRef ();
  73. }
  74. };

And how do we use that? We create an example class Buffer that can be referred to by ReferenceToObj:

  1. class Buffer : public Referenced
  2. {
  3. Buffer (const Buffer &);
  4. Buffer &operator= (const Buffer &);
  5.  
  6. public:
  7. Buffer () throw () :
  8. {
  9. std::cout << "--] Buffer created" << std::endl;
  10. }
  11.  
  12. void append (const char *src, const size_t srcSize);
  13.  
  14. ~Buffer ()
  15. {
  16. std::cout << "--] Buffer destroyed" << std::endl;
  17. }
  18. };

And create the reference:

  1. ReferenceToObj< Buffer > ref (new Buffer());

Now to access the buffer we use ref as if it was a pointer to the buffer:

  1. ref->append (s, strlen(s));

The magic is done in the copy constructor and the assignment operator. They increases and decreases the reference counter appropriately so that if you create a copy of the reference, the reference counter if incremented and if a reference is destroyed (goes outside of a scope where was declared) the counter is decreased. Finally, when the reference counter is 0 the object is deleted automatically.

Let's see this in action:

  1. void fill (ReferenceToObj< Buffer > buf)
  2. {
  3. const char *s = "Example content";
  4. buf->append (s, strlen(s));
  5. }
  6.  
  7. fill (ref);

After compiling and running the code we can see that the constructor of Buffer if invoked only once and no object copying/assignment is performed. The object is also destroyed properly. This will be also true if you insert the buffer (ref object int an STL container).

Remember to always pass ReferenceToObj by value, not as a reference or pointer because the whole idea will collapse, reference counter would not be maintained!

Performance


Of course such automagic comes with some cost. How much? I've created a simple class that holds an integer and two functions: both incrementing the integer by one, but taking different arguments:

  1. void increment (ReferenceToObj< Bench > ref) throw ();
  2. void increment (Bench *ref) throw ();

The body of both functions is the same:

  1. ref->set (ref->get() + 1);

Then I'm running them few million times to measure the execution time. If compiled without optimization the pointer version executes about 4 times faster than using the reference. With optimization (but increment() functions are not inlined since whey are in different compilation unit than invoking code) it's about 2 times faster.

In practice You don't use techniques described in this article on such simple objects, so the performance drop will not be as big, this benchmark is the worst case. You can actually boost the performance by avoiding objects copying.

Further improvements


The code above may be improved for example by making it thread-safe to pass references between threads.

Classes and example code


You can download example code and full source of Referenced and ReferenceToObj classes below.

AttachmentSize
cpp_objects_reference_counting.tgz1.79 KB

Comments

ReferenceToObj (ReferenceToObj< T > &orig)

why the copy constructor takes non-const reference as a parameter?

The incRefCountint() method

The incRefCountint() method used in this constructor modifies the reference counter and is non constant.

Just the ticket for Microsoft COM objects

Microsoft component object model makes extensive use of reference counting and their sample code is simply painful to read with goto statements and macro's galore as they struggle to manage the counters on different paths. Sadly they don't use a common base class for the reference counting but with some minor adaptations the technique you describe here is helping me avoid software spaghetti junction hell! Thanks :o)

This was the most crisp and

This was the most crisp and precise explanation of reference counting. Thanks a ton

Spelling mistake

You made a spelling mistake in the following code:
bool operator= (const ReferenceToObj< T >&right)	
{
    return obj == right.obj;
}
The operator= should be operator==, like this:
bool operator== (const ReferenceToObj< T >&right)	
{
    return obj == right.obj;
}

You're right, thanks.

You're right, thanks.