Using Valgrind to debug memory leaks

Valgrind is a wonderful tool useful mainly to debug memory related problems in C/C++ programs. I don't know a better tool to find memory leaks. Although output of this program is often clear and intuitive it's worth to spend some time to get deeper knowledge of how Valgrind works, what exactly its messages mean and what are the problematic cases when tracing a memory leak is harder even with Valgrind.

Short introduction to Valgrind


Valgrind is a tool suite that automatically detect many memory and thread related problems with an application. It's composed of few tools, each designed to track different kind of problems. Valgrind can:
  • detect bad memory usage (reading uninitialized memory, writing past the buffer etc.)
  • detect memory leaks (this is what I'll cover here).
  • profile CPU cache usage
  • profile program like gprof.
  • profile heap usage
  • detect thread related problems with shared memory.

All those features can be used by running one command and your program's executable file as an argument. In this article I will cover usage of the memcheck, so in most cases we will run valgrind this way:

valgrind --tool=memcheck --leak-check=yes progrm_name

It's often useful to redirect valgrind's output to a file instead of stderr, you can use

--log-file=valgrind.log
command to do that.

Memory leaks


What is a memory leak? Basically it's a case when a program no longer uses some chunk of dynamically allocated memory (will not need it in it's life time) but the memory was not deallocated. In practice the real problem is when the program "grows", i.e. constantly allocates chunks of memory (e.g. in a loop) and does not free it. Valgrind can automatically detect most of cases where memory leaks and point them in the code.

Example run


Let's consider an example program that copies it's standard input to the standard output:

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <unistd.h>
  4.  
  5. static int copy_data (const int len, const int buf_size)
  6. {
  7. char *buf = (char *) malloc (buf_size);
  8. int left = len;
  9.  
  10. while (left) {
  11. ssize_t res = read (0, buf, left < buf_size ? left : buf_size);
  12.  
  13. if (res < 0) {
  14. perror ("read");
  15. return -1;
  16. }
  17.  
  18.  
  19. if (res == 0) {
  20. fprintf (stderr, "Have less data than needed!\n");
  21. return left - len;
  22. }
  23.  
  24. if (write(1, buf, res) < 0) {
  25. perror ("write");
  26. exit (1);
  27. }
  28.  
  29. left -= res;
  30. }
  31.  
  32. free (buf);
  33. return len;
  34. }
  35.  
  36. int main (int argc, char *argv[])
  37. {
  38. copy_data (128, 16);
  39.  
  40. return 0;
  41. }

To run it under valgrind use the following command:

valgrind --tool=memcheck --leak-check=yes ./test < /dev/urandom >/dev/null

Valgrind will show someting like that (omitting the introduction text);

==15846== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 11 from 1)
==15846== malloc/free: in use at exit: 0 bytes in 0 blocks.
==15846== malloc/free: 1 allocs, 1 frees, 16 bytes allocated.
==15846== For counts of detected errors, rerun with: -v
==15846== All heap blocks were freed -- no leaks are possible.

The first line tells us that there were no errors, it means that Valgrind didn't detected any bad memory access etc. More interesting is the summary of memory allocation/deallocation that follows the first line. They are:

==15846== malloc/free: in use at exit: 0 bytes in 0 blocks.

At the process exit time there was no dynamically allocated memory that was not freed.

==15846== malloc/free: 1 allocs, 1 frees, 16 bytes allocated.

The program allocated one chunk of memory and also freed one chunk of memory. 16 bytes in total were allocated.

==15846== All heap blocks were freed -- no leaks are possible.

This program in the sample run is 100% memory leak free. In practice you will probably not see this message very often. I'll show you later why.

Simple case: memory was lost.


The example program is not perfect, let's see what happens when we try to copy content of /dev/null to stdout:

$ valgrind --tool=memcheck --leak-check=yes ./test < /dev/null
Have less data than needed!
==16245== 
==16245== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 11 from 1)
==16245== malloc/free: in use at exit: 16 bytes in 1 blocks.
==16245== malloc/free: 1 allocs, 0 frees, 16 bytes allocated.
==16245== For counts of detected errors, rerun with: -v
==16245== searching for pointers to 1 not-freed blocks.
==16245== checked 52,588 bytes.
==16245== 
==16245== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==16245==    at 0x4025D2E: malloc (vg_replace_malloc.c:207)
==16245==    by 0x8048524: copy_data (test.c:7)
==16245==    by 0x8048642: main (test.c:37)
==16245== 
==16245== LEAK SUMMARY:
==16245==    definitely lost: 16 bytes in 1 blocks.
==16245==      possibly lost: 0 bytes in 0 blocks.
==16245==    still reachable: 0 bytes in 0 blocks.
==16245==         suppressed: 0 bytes in 0 blocks.

Wow! in case of an error something wrong happened. Valgrind detected that 16 bytes were definitely lost (leak summary at the bottom). We can see that there was 1 bock, 16B large that was not deallocated. Valgrind even shows the backtrace at the place when it was allocated and says it's confident it's a memory leak (6 bytes in 1 blocks are definitely lost). This is a case when Valgrind is 100% sure that it detected a memory leak. Such memory chunks are described as definitely lost. Why it's so sure? How does it work?

Valgrind tracks each memory allocation (when using standard facilities: malloc(), new operator etc.) and deallocation. When it sees that no pointer exists in the program to the allocated memory chunk (or its content) the program has no way to deallocate it, so there is definitely a memory leak. In our example in case of an error in copy_data() the buffer pointed to by the local buf variable is not freed. When we return from the function the pointer to the buffer is lost, so Valgrind says there is a memory leak. This is the simplest, most obvious situation, let's look at other examples.

When it's not as simple


The situation above is clear for Valgrind but there are difficult cases. Let's see another example. The code is silly but in practice difficult cases appear in complex programs and it's hard to create a short example that makes sense and shows to problem:

  1. #include <stdio.h>
  2. #include <string.h>
  3. #include <ctype.h>
  4.  
  5. char *lower;
  6.  
  7. char *to_lower (const char *str)
  8. {
  9. char *l = strdup (str);
  10. char *c;
  11.  
  12. for (c = l; *c; c++) {
  13. if (isupper(*c))
  14. *c = tolower(*c);
  15. }
  16.  
  17. return l;
  18. }
  19.  
  20. int main (int argc, char *argv[])
  21. {
  22. lower = to_lower (argv[1]);
  23.  
  24. while (*lower)
  25. putchar (*(lower++));
  26. puts ("");
  27.  
  28. return 0;
  29. }

The program prints it's first argument in lower case. Valgrind shows:

==28578== 5 bytes in 1 blocks are possibly lost in loss record 1 of 1
==28578==    at 0x4025D2E: malloc (vg_replace_malloc.c:207)
==28578==    by 0x40D805F: strdup (in /lib/tls/i686/cmov/libc-2.8.90.so)
==28578==    by 0x8048504: to_lower (test2.c:9)
==28578==    by 0x804857F: main (test2.c:22)

This time memory is possibly lost. Why it's not sure? Because at the program exit time we didn't completely lost the pointer to the allocated memory, we've only advanced it to print the lower string in a funny way. It's theoretically possible that we have a variable, a counter that tells us how much we've advanced, so we could compute the pointer to the memory to free it.

A different case is when we modify the main function this way:

  1. int main (int argc, char *argv[])
  2. {
  3. lower = to_lower (argv[1]);
  4. puts (lower);
  5. return 0;
  6. }

This way Valgrind doesn't even tell us that there is a memory leak! It's worth to point again that the "definition" of a memory leak used by Valgrind is the case when the program loses the pointer to a dynamically allocated memory. In the above example the pointer was not lost. Despite this fact Valgrind isn't completely silent in this case. It tells us:

==29438== LEAK SUMMARY:
==29438==    definitely lost: 0 bytes in 0 blocks.
==29438==      possibly lost: 0 bytes in 0 blocks.
==29438==    still reachable: 5 bytes in 1 blocks.
==29438==         suppressed: 0 bytes in 0 blocks.
==29438== Reachable blocks (those to which a pointer was found) are not shown.
==29438== To see them, rerun with: --leak-check=full --show-reachable=yes

The problematic memory chunk is called reachable block. It's memory that was not freed, but a pointer to it still exists at the program's exit time. If we add the suggested options to the Valgrind's invocation we will see:

==29879== 5 bytes in 1 blocks are still reachable in loss record 1 of 1
==29879==    at 0x4025D2E: malloc (vg_replace_malloc.c:207)
==29879==    by 0x40D805F: strdup (in /lib/tls/i686/cmov/libc-2.8.90.so)
==29879==    by 0x80484C4: to_lower (test2.c:9)
==29879==    by 0x804853F: main (test2.c:22)

Just like when the memory was definitely lost we see the palce in the program where the problematic block was allocated. Judging if it's really a memory leak is left to the programmer.

Common pitfalls

Custom memory allocators


There are situations when debugging memory leaks is harder that usual. First of all Valgrind understands only standard allocation/deallocation routines like malloc()/new and if you (or a library you are using) use some custom memory allocators Valgrind can't track memory usage. One example is glib library. If something allocates memory using it's routines Valgrind doesn't work properly by default. In this case authors of the library created an easy mechanism to overcome the problem: you can set some environment variables (G_DEBUG set to include gc-friendly, G_SLICE=always-malloc) to switch glib custom memory allocation to use system's functions directly. Another solution is to teach Valgrind how your allocators works using it's mechanism to describe custom memory allocators. This is useful if you are writing your own memory allocator.

Not exactly a memory leak


Sometimes you see the program grows because standard system monitoring utilities show that the process allocates more memory than you expect but Valgrind shows nothing. The reason could be that memory is deallocated but not as soon as possible. Imagine that you are creating new objects in some other "master" object, all pointers are stored in a vector so they can be freed in the destructor. Unfortunately your object that does such things has a long life time (probably is destroyed at the program's exit time) and allocated objects are not freed as soon as they are no longer needed but when the program exits. It's not a memory leak for Valgrind, but it's definitely a bug.

I remember I was using the std::vector container in improper way. I put large amount of objects in it and at some point use the clear() method to delete them. There was no memory leak, but the way the implementation of the standard vector class worked was to use some memory pool internally so shrinking the vector didn't always cause memory deallocation. In some situations the process was large but at that time it should not use such bug amount of memory. It was case where there was no memory leak (Valgrind didn't show anything) but bad usage of a library caused excessive memory consumption. The solution was to use std::list instead on std::vector.

In such situations the massif Valgrind tool may be helpful.

Memory leaks that are not harmful


Sometimes a programmer just knows about the memory leak, but he also knows it's not harmful. One situation is loading the program configuration into dynamic structures at program startup and not freeing it anywhere. It's one time allocation, the data are used during whole programs run time, so it's practically not a leak. Sometimes it's easier just to not free it than searching for a proper place in the program to do that. One drawback is that Valgrind will show spurious warnings that are irritating. Letting the operating system automatically free all program resources at exit can be also faster than doing it int the program itself.

Valgrind shows how many memory blocks leaked, it's useful to judge how bad the memory leak is. If there are single block leaks despite of running the code in a loop or running the program for long time I consider them low-priority bugs.

Memory leaks in libraries


Sometimes Valgrind shows a leak in a library you are using. They may be true leaks caused bu a buggy library but first thing you should do in such situation is to check if you are using the library's API properly. You might miss an information that returned objects must be freed manually or something like that.

It's often common that a library allocates some global resources in it's initialization routine or even automatically in the first call to one of its function. There may be even no way to deallocate them. An I said before it's useful to run the code in a loop and see if the leak is a single memory block or there are many of them.

Drawbacks of run-time analyze


If Valgrind doesn't show any leaks this doesn't mean that none exist! It's drawback of run-time analyze that the code checked was just the code that was used during program run. For example if you don't properly free memory when handling an error and the error didn't occur during Valgrind run you will not find the leak. Like in the first example: the leak is present only in case of a read error. It's probably the biggest drawback of this tool in my opinion but it's hard to imagine a better one.

In practice...


At the beginning of using Valgrind to debug my programs I used to think this way: It's just an automatic, dumb tool that tracks memory allocations and can be wrong. I looked at the code and there can be no memory leak at this point, it's one of the cases when Valgrind is wrong.. But I was wrong! After years of using it I can see that 99,9% of it's messages are right but it's often hard to see it in the code.

One real world case was when I was writing a multi-threaded program that used libmysqlclient library and valgrind showed memory leaks in mysql_real_connect()/mysql_init(). It's clear from documentation that the memory allocated by the library when using those functions should be freed by mysql_close(). From the code it was clear that I do it properly: every created connection was closed. I even added a counter to the places when I create connection and destroy it and saw all connections were destroyed. I started to think that there is a memory leak in the libmysqlclient_r library (a thread-safe version) but when I separated the code (wrote a simple program that allocates conenctions and free them) Valgrind showed no errors. So there are no leaks in the library. If I had less believe in Valgrind I would give up at this moment, but I knew it's right. As I found out there is a special requirement by libmysqlclient_r, I just didn't read the documentation. If you are creating MYSQL objects in different threads the library automatically allocates per-thread global data, but to free them you must use mysql_thread_end(). It's not done automatically in contrast to allocation.

More valgrind features


Valgrind is not just a memory leak detector. There are other things it can do:
  • The memcheck tool that we used also shows bad memory access. It reports situations when: you read uninitialized memory, read past the buffers, access freed memory etc. All with nice stack traces. This is really the primary use of Valgrind for me.
  • Heap profiler, massif is a tool that profiles heap usage. If the program consumes too much memory but the cause are not memory leaks it's useful to see where most memory is allocated. This is what this tool does.
  • Helgrind: detects possible race conditions in multi-threaded programs. It tracks shared memory usage and automatically detects where no locks are held when necessary. Also shows other thread-related problems.

There are also other tools worth to know, just look th the Valgrind's documentation.

Comments

hash with pointers

What if I have a callback in which I allocate memory and put into a hash. Then the callback returns and after too many callbacks the program ends calling the hash destroy function and freeing all the items in the hash calling the proper function. Theoretically the program is ok, is freeing the memory but, for some reason Valgrind shows "definitely lost" for that allocs. Is this possible? Or I got something wrong?

From my experience

From my experience "definitely lost" means there is a memory leak. The rule is too simple to get it wrong: for every malloc/new there must be matching free/delete when you "forget" the last pointer to that piece of memory. I remember when I began with valgrind I had many cases when I thought it's wrong but eventually always found that I'm wrong :)

re-explanation on valgrind memory leaks

the explanation has been done with programming examples.that practice is really good.It helps a lot.Thank you very much.

Good article

Good tool but I prefer Deleaker - it similar to Valgrind but for Windows. :)