memory leak in rsyslog-3.20.0 with PostgreSQL backend

This is the place for developers to discuss bugs, new features and everything else about code changes.

memory leak in rsyslog-3.20.0 with PostgreSQL backend

Postby ktm » Thu Nov 13, 2008 7:59 pm

We are looking into adding an rsyslog based log server to our monitoring process. We have a test instance running against the newly released 3.20.0 and it is apparantly leaking memory quite regularly. Here is what is looks like after a fresh start:

Starting system logger (rsyslog): [ OK ]
root@logs:~> ps auwwx | grep rsyslog
root 13794 0.4 0.0 58148 2440 ? Sl 12:18 0:00 /usr/site/rsyslog/sbin/rsyslogd -f/etc/opt/rsyslog/rsyslog.conf -m 0

and here it is 40 minutes later:

root@logs:/var/log> ps auwwx | grep rsyslog
root 13794 0.8 0.3 84396 27948 ? Sl 12:18 0:22 /usr/site/rsyslog/sbin/rsyslogd -f/etc/opt/rsyslog/rsyslog.conf -m 0

It has already leaked 25MB of memory. This never stops and the server must be restarted to prevent an out-of-memory/swap condition. Do you have any ideas about what is causing the leak?

Cheers,
Ken
ktm
New
 
Posts: 3
Joined: Thu Nov 13, 2008 7:32 pm

Professional Services Information

  • Custom written rsyslog.conf?
  • Maintenance Contract?
  • Installation support?

Re: memory leak in rsyslog-3.20.0 with PostgreSQL backend

Postby ktm » Thu Nov 13, 2008 9:43 pm

Here is a valgrind run against rsyslogd-3.20.0. I hope that this helps you locate the memory leak.

Ken

--------------------
Shutting down system logger (rsyslog): ==14500==
==14500== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 118 from 2)
==14500== malloc/free: in use at exit: 630,464 bytes in 4,376 blocks.
==14500== malloc/free: 108,825 allocs, 104,449 frees, 12,717,199 bytes
allocated.
==14500== For counts of detected errors, rerun with: -v
==14500== searching for pointers to 4,376 not-freed blocks.
==14500== checked 31,596,412 bytes.
==14500==
==14500== 8 bytes in 2 blocks are definitely lost in loss record 1 of 14
==14500== at 0x43B8405: malloc (vg_replace_malloc.c:149)
==14500== by 0x5713DD: _dl_map_object_from_fd (in /lib/ld-2.3.4.so)
==14500== by 0x5720BB: _dl_map_object (in /lib/ld-2.3.4.so)
==14500== by 0x575D97: openaux (in /lib/ld-2.3.4.so)
==14500== by 0x5770FD: _dl_catch_error (in /lib/ld-2.3.4.so)
==14500== by 0x576039: _dl_map_object_deps (in /lib/ld-2.3.4.so)
==14500== by 0x680158: dl_open_worker (in /lib/tls/libc-2.3.4.so)
==14500== by 0x5770FD: _dl_catch_error (in /lib/ld-2.3.4.so)
==14500== by 0x680CB7: _dl_open (in /lib/tls/libc-2.3.4.so)
==14500== by 0x6B1CB7: dlopen_doit (in /lib/libdl-2.3.4.so)
==14500== by 0x5770FD: _dl_catch_error (in /lib/ld-2.3.4.so)
==14500== by 0x6B22BA: _dlerror_run (in /lib/libdl-2.3.4.so)
==14500==
==14500==
==14500== 144 bytes in 1 blocks are possibly lost in loss record 7 of 14
==14500== at 0x43B8405: malloc (vg_replace_malloc.c:149)
==14500== by 0x62B4C6A: ???
==14500== by 0x62BE56A: ???
==14500== by 0x62B631F: ???
==14500== by 0x62B64B7: ???
==14500== by 0x62542AF: ???
==14500== by 0x6254353: ???
==14500== by 0x8072912: actionCallDoAction (action.c:429)
==14500== by 0x806CE08: qAddDirect (queue.c:938)
==14500== by 0x806EEF4: queueEnqObj (queue.c:1015)
==14500== by 0x8072465: actionWriteToAction (action.c:570)
==14500== by 0x80725B3: actionCallAction (action.c:672)
==14500==
==14500==
==14500== 408 bytes in 3 blocks are possibly lost in loss record 10 of 14
==14500== at 0x43B96BF: calloc (vg_replace_malloc.c:279)
==14500== by 0x57971A: _dl_allocate_tls (in /lib/ld-2.3.4.so)
==14500== by 0x6E191E: pthread_create@@GLIBC_2.1 (in
/lib/tls/libpthread-2.3.4.so)
==14500== by 0x80697E4: wtpAdviseMaxWorkers (wtp.c:476)
==14500== by 0x806F6F4: queueStart (queue.c:388)
==14500== by 0x80513A0: init (syslogd.c:2344)
==14500== by 0x8053135: realMain (syslogd.c:2773)
==14500== by 0x80538DE: main (syslogd.c:3417)
==14500==
==14500==
==14500== 623,664 bytes in 4,331 blocks are definitely lost in loss record
14 of 14
==14500== at 0x43B8405: malloc (vg_replace_malloc.c:149)
==14500== by 0x62B4C6A: ???
==14500== by 0x62BE56A: ???
==14500== by 0x62B631F: ???
==14500== by 0x62B64B7: ???
==14500== by 0x62542AF: ???
==14500== by 0x6254353: ???
==14500== by 0x8072912: actionCallDoAction (action.c:429)
==14500== by 0x806CE08: qAddDirect (queue.c:938)
==14500== by 0x806EEF4: queueEnqObj (queue.c:1015)
==14500== by 0x8072465: actionWriteToAction (action.c:570)
==14500== by 0x80725B3: actionCallAction (action.c:672)
==14500==
==14500== LEAK SUMMARY:
==14500== definitely lost: 623,672 bytes in 4,333 blocks.
==14500== possibly lost: 552 bytes in 4 blocks.
==14500== still reachable: 6,240 bytes in 39 blocks.
==14500== suppressed: 0 bytes in 0 blocks.
==14500== Reachable blocks (those to which a pointer was found) are not
shown.
==14500== To see them, rerun with: --show-reachable=yes
ktm
New
 
Posts: 3
Joined: Thu Nov 13, 2008 7:32 pm

Re: memory leak in rsyslog-3.20.0 with PostgreSQL backend

Postby rgerhards » Fri Nov 14, 2008 8:26 am

Hi,

a valgrind log is extremely useful for me, but rsyslog needs to be prepared in order to obtain a useful log. You need to add the --enable-valgrind option to the set of ./configure options.

This option prevents rsyslog from actually unloading its modules. Because once a module is unloaded, valgrind can no longer de-reference the symbols. As almost everything in rsyslog is a loadable module, the valgrind log will not contain useful information without this switch (what unfortunately is the case with the one you provided).

Please note that I would be grateful if you could run the valgrind-enabled version. The postgres code is contributed and while I try to keep it in good shape, I usually have not test environment for that. Sure, setting up postgres is not that hard, but another thing that takes time...

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2642
Joined: Thu Feb 13, 2003 11:57 am

Re: memory leak in rsyslog-3.20.0 with PostgreSQL backend

Postby ktm » Fri Nov 14, 2008 4:18 pm

Okay, here are the lines from valgrind after rebuilding with --enable-valgrind:

...
==17731== 244,800 bytes in 1,700 blocks are definitely lost in loss record 19 of 19
==17731== at 0x43B8405: malloc (vg_replace_malloc.c:149)
==17731== by 0x62B4C6A: PQmakeEmptyPGresult (in /opt/opt.CORE/postgresql-8.3.3/rhel4/lib/libpq.so.5.1)
==17731== by 0x62BE56A: pqParseInput3 (in /opt/opt.CORE/postgresql-8.3.3/rhel4/lib/libpq.so.5.1)
==17731== by 0x62B631F: PQgetResult (in /opt/opt.CORE/postgresql-8.3.3/rhel4/lib/libpq.so.5.1)
==17731== by 0x62B64B7: PQexecFinish (in /opt/opt.CORE/postgresql-8.3.3/rhel4/lib/libpq.so.5.1)
==17731== by 0x62542AF: writePgSQL (ompgsql.c:170)
==17731== by 0x6254353: doAction (ompgsql.c:203)
==17731== by 0x8072902: actionCallDoAction (action.c:429)
==17731== by 0x806CDF8: qAddDirect (queue.c:938)
==17731== by 0x806EEE4: queueEnqObj (queue.c:1015)
==17731== by 0x8072455: actionWriteToAction (action.c:570)
==17731== by 0x80725A3: actionCallAction (action.c:672)
==17731==
==17731== LEAK SUMMARY:
==17731== definitely lost: 244,800 bytes in 1,700 blocks.
==17731== possibly lost: 552 bytes in 4 blocks.
==17731== still reachable: 27,716 bytes in 172 blocks.
==17731== suppressed: 0 bytes in 0 blocks.

I looked at the code in ompgsql.c where it calls PQexec(). Apparently, the original author was not handling the returned pointer to the PGresult from the call. It needs a specific call to PQclear for each of the calls. The change needed is to wrap the PQexec calls with the PQclear() since the individual results are never examined anyway:

orig:
Code: Select all
        /* try insert */
        PQexec(pData->f_hpgsql, (char*)psz);
        if(PQstatus(pData->f_hpgsql) != CONNECTION_OK) {
                /* error occured, try to re-init connection and retry */
                closePgSQL(pData); /* close the current handle */
                CHKiRet(initPgSQL(pData, 0)); /* try to re-open */
                PQexec(pData->f_hpgsql, (char*)psz);
                if(PQstatus(pData->f_hpgsql) != CONNECTION_OK) { /* re-try insert */


to:
Code: Select all
        /* try insert */
        PQclear(PQexec(pData->f_hpgsql, (char*)psz));
        if(PQstatus(pData->f_hpgsql) != CONNECTION_OK) {
                /* error occured, try to re-init connection and retry */
                closePgSQL(pData); /* close the current handle */
                CHKiRet(initPgSQL(pData, 0)); /* try to re-open */
                PQclear(PQexec(pData->f_hpgsql, (char*)psz));
                if(PQstatus(pData->f_hpgsql) != CONNECTION_OK) { /* re-try insert */


The valgrind check of the resulting executable shows no memory leak. This change will need to be made to all rsyslog ompgsql.c files for their next point release.

Cheers,
Ken
ktm
New
 
Posts: 3
Joined: Thu Nov 13, 2008 7:32 pm

Re: memory leak in rsyslog-3.20.0 with PostgreSQL backend

Postby rgerhards » Fri Nov 14, 2008 4:20 pm

Hi Ken,

excellent, many thanks for the patch. I'll integrate it in all relevant releases, but it will appear when each of them is released. Frequency depends, but should not take too long all in all.

Thanks a lot for your help! :)

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2642
Joined: Thu Feb 13, 2003 11:57 am

Re: memory leak in rsyslog-3.20.0 with PostgreSQL backend

Postby rgerhards » Tue Nov 18, 2008 12:31 pm

Just FYI: I have integrated the patch (with credits of course :)) into all branches starting from v2-stable. It will be rolled out as new releases occur (given how long it is in the codebase and nobody complained, it seems not to be pretty urgent to push the new releases).

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2642
Joined: Thu Feb 13, 2003 11:57 am

Google Ads



Return to Developer's Corner

Who is online

Users browsing this forum: No registered users and 0 guests

cron