rsyslog performance

This is the place for developers to discuss bugs, new features and everything else about code changes.

Re: rsyslog performance

Postby rgerhards on Mon Oct 06, 2008 7:54 pm

Does the CPU utilization really equal absolute throughput - I am most astonished. This means that the select() calls and all this (these that we saved now) do not have any impact at all? So we are now back to 30,000 msgs/second or so?
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Professional Services Information

  • Custom written rsyslog.conf?
  • Maintenance Contract?
  • Installation support?

Re: rsyslog performance

Postby dlang on Tue Oct 07, 2008 2:47 am

at one point we were getting 80,000 messages/sec with 100% cpu on the listening thread

now with ~70% cpu on the listening thread we are getting ~50,000 messages/sec

if we get up to ~85% cpu on the listening thread I see ~60,000 messages/sec

my guess is that the futex calls and the occasional memory locking calls that I see are eating more cpu than the select saved (the time for the calls as shown by strace is higher than the select), and the fact that it's blocked waiting for the locks to be available is driving down the throughput.

I think that the time the listening thread is actually running and not stalled waiting for locks or trying to get locks is spent more efficiantly, but it's spending less time doing that (and I've tried this with and without sched_yield(), it looks like with the yield call the writer eats up more CPU, but doesn't seem to get anything more done)
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards on Tue Oct 07, 2008 6:24 am

Thanks, this is what I was looking for. That gives me an impression where we actually are.

dlang wrote:my guess is that the futex calls and the occasional memory locking calls that I see are eating more cpu than the select saved (the time for the calls as shown by strace is higher than the select), and the fact that it's blocked waiting for the locks to be available is driving down the throughput.


I agree that there must be a locking issue, but I don't think that what you say here is the reason. The point is that I just removed some (select()) calls, but did not add any futex calls. So the number of futex calls is the same and the absolute number is linear to the number of messages received.

As I said, there were some changes which could affect locking and I'll dig out those that I think may cause the issue. This probably happened while we were not 100% aligned in our testing.

I'll also review code again with the new knowledge. I yesterday identified an area where I can save malloc/free calls, but I was hesitant to do this without getting some results from you. I expect the effect to be minor, but that doesn't hurt. I'll also run a check on the actual mutex operations done (from the pthreads level, not the strace level).

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Tue Oct 07, 2008 6:28 am

On to the first test. Could you please apply the following change to queue.c, line 1438. Currently, it reads:

Code: Select all
   /* rgerhards, 2008-09-30: I reversed the order of cond_signal und mutex_unlock
    * as of the pthreads recommendation on predictable scheduling behaviour. I don't see
    * any problems caused by this, but I add this comment in case some will be seen
    * in the next time.
    */
   pthread_cond_signal(&pThis->notFull);
   d_pthread_mutex_unlock(pThis->mut);
   pthread_setcancelstate(iCancelStateSave, NULL);


Please change the order of signal/unlock as follows:


Code: Select all
   d_pthread_mutex_unlock(pThis->mut);
   pthread_cond_signal(&pThis->notFull);


This undoes my change. I would be very interested to see if there is any difference in performance.

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang on Tue Oct 07, 2008 7:35 am

rgerhards wrote:Thanks, this is what I was looking for. That gives me an impression where we actually are.

dlang wrote:my guess is that the futex calls and the occasional memory locking calls that I see are eating more cpu than the select saved (the time for the calls as shown by strace is higher than the select), and the fact that it's blocked waiting for the locks to be available is driving down the throughput.


I agree that there must be a locking issue, but I don't think that what you say here is the reason. The point is that I just removed some (select()) calls, but did not add any futex calls. So the number of futex calls is the same and the absolute number is linear to the number of messages received.
but one of the features of a futex is that if the lock is not contended all of the locking can be done in userspace. it's only when the lock has contention that system calls into the kernel are needed. so even without you adding any new calls something that increases contention will increase the number of futex calls that I see.

As I said, there were some changes which could affect locking and I'll dig out those that I think may cause the issue. This probably happened while we were not 100% aligned in our testing.

I'll also review code again with the new knowledge. I yesterday identified an area where I can save malloc/free calls, but I was hesitant to do this without getting some results from you. I expect the effect to be minor, but that doesn't hurt. I'll also run a check on the actual mutex operations done (from the pthreads level, not the strace level).
I am taking the day off tomorrow, so I won't be able to do a lot of testing (go ahead and e-mail me with what you want done, just in case I get a chance to do some though)
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards on Tue Oct 07, 2008 8:37 am

dlang wrote:but one of the features of a futex is that if the lock is not contended all of the locking can be done in userspace. it's only when the lock has contention that system calls into the kernel are needed. so even without you adding any new calls something that increases contention will increase the number of futex calls that I see.


This is good information. Actually, the futex calls are done by the pthreads library in the rsyslog case, so I am not fully aware when and why they are called. Also, I concentrated on the semantics of the mutex call, which of course do not describe this.

Could you provide me with a sample strace (just a couple 100 of messages would be fine) so that I can compare what your system does compared to mine?

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Wed Oct 08, 2008 9:54 am

just a quick update. I did some testing and experimenting yesterday and also reviewed the core algorithms. In the use case you have, there are very quick inputs and outputs. Once they are optimized, this puts the performance limit mostly on sync primitives. In such situations, the (single) main message queue becomes a bottleneck.

My design goal was to de-couple possibly long running outputs and inputs. The current design simply does not cover situations where both inputs and outputs are very quick. I can do some limited re-ordering of functionality, but in general a modified design is necessary to get increased throughput for those cases. The goal there is to enable parallel message submission. Multiple concurrent queues may be one way to achieve this, but others for sure exist. Anyway, this is not something that can (or at least should) be done on the fly.

So I will do some of the functional re-ordering, but will not touch the underlying design for now. I intend to do that in the next milestone release, which I think I will be able to work on soon. It will be based on what is currently in the perf branch (and not everything in perf will be merged back to the current master).
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Wed Oct 08, 2008 5:52 pm

For your information: I have done some reordering of processing today and on my low-end system the results look very promising.

http://git.adiscon.com/?p=rsyslog.git;a ... deca7428d7

The change is not 100% tested and completed yet, but if you have a chance to have a quick look at it, I would be most interested in your results. This is in the "perf" branch.

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang on Thu Oct 09, 2008 1:27 am

I just tried ace4f2f75202aec39449dac11b9eb1deca7428d7 (current head of perf) and had horrible results. I couldn't get it to startup properly I received no messages from UDP and when I tried to write a log via 'logger test' on the local box rsyslog crashed I grabbed a strace of the failure and attached it
Attachments
rsyslog.strace.tar.bz2
strace of failed run
(199.77 KiB) Downloaded 2 times
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards on Thu Oct 09, 2008 6:20 am

Looks like those things not yet fully tested doesn't seem to work. I thought UDP had a good chance, but anyhow. I'll post an update when I am done with the testing. Sorry for the noise.
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang on Thu Oct 09, 2008 6:26 am

moving back to cdecd7e524a5114ccff4f2909b32738bdb33c6ea I can get rsyslog to start without being under strace, but it segfaults when I do 'logger test'

backing up one more step to 7b3c05da9f126063384c80e9dd6fd5b2ae610074 I get the following when it segfaults

2842.538913687:{6923}main queue:Reg/w0: (/var/log/messages)
2842.538934357:{6923}main queue:Reg/w0: main queue: entering rate limiter
2842.538955586:{6923}main queue:Reg/w0: main queue:Reg/w0: worker IDLE, waiting for work.
2848.923093575:{6925}imuxsock.c: Message from UNIX socket: #3
2848.923126257:{6925}imuxsock.c: logmsg: flags 4, pri 13, from 'secdev', msg Oct 8 23:14:08 dlang: test
2848.923156703:{6925}imuxsock.c: Message has legacy syslog format.
2848.923188826:{6925}imuxsock.c: main queue: entry added, size now 1 entries
2848.923226815:{6925}imuxsock.c: wtpAdviseMaxWorkers signals busy
2848.923298044:{6925}imuxsock.c: main queue: EnqueueMsg advised worker start
2848.923492737:{6925}imuxsock.c: --------imuxsock calling select, active file descriptors (max 3): 3
2848.923519832:{6923}main queue:Reg/w0: main queue: entering rate limiter
2848.923544413:{6923}main queue:Reg/w0: main queue: entry deleted, state 0, size now 0 entries
2848.923576815:{6923}main queue:Reg/w0: Called action, logging to builtin-file
2848.924060614:{6923}main queue:Reg/w0: XXXX create our own timestamp: 19471909269648759, system time is 1223532848
Segmentation fault
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang on Thu Oct 09, 2008 6:27 am

I just found one problem, apparently with my configuration writing with logger doesn't work properly

I'll re-do the tests with remote logging and see what happens.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards on Thu Oct 09, 2008 6:28 am

dlang wrote:I just found one problem, apparently with my configuration writing with logger doesn't work properly

I'll re-do the tests with remote logging and see what happens.

Do you mean depending on the config things go wrong? If so, I would be most interested in the "wrong" config (as this should not lead to a segfault).
User avatar
rgerhards
Site Admin
 
Posts: 1662
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang on Thu Oct 09, 2008 6:38 am

commit cdecd7e524a5114ccff4f2909b32738bdb33c6ea works

all later ones fail

I tried reverting 8528344ef58b5d2907bba8809f63d0bca2ce8d38 and 0fa23994669417fff4c4c057ce0c9d1e96f6d56c, but the first one doesn't revert cleanly.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang on Thu Oct 09, 2008 6:41 am

rgerhards wrote:
dlang wrote:I just found one problem, apparently with my configuration writing with logger doesn't work properly

I'll re-do the tests with remote logging and see what happens.

Do you mean depending on the config things go wrong? If so, I would be most interested in the "wrong" config (as this should not lead to a segfault).

the config I listed on the first page of this thread causes a segfault when something writes to /dev/log (the example being "logger test")

all my other tests have been remote reception of udp messages, I don't know how long the /dev/log has been broken (I went back to the commit where $UDPServerTimeRequery was added and found the problem there, since I _know_ I did testing of that commit I realized that I had something different with my testing of the new builds)
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Google Ads


PreviousNext

Return to Developer's Corner

Who is online

Users browsing this forum: No registered users and 0 guests

cron