rsyslog performance

This is the place for developers to discuss bugs, new features and everything else about code changes.

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 4:02 pm

rgerhards wrote:Thanks for the great post, for now just one answer

dlang wrote:it seems as if you need the timestamp at the time that you recieve the message, but can't you just store that time in the Msg structure and everything after that uses the one of the two timestamps in the message (either the time provided by the sending system or the time that rsyslog recieved the message), I'm missing why you would need any other timestamps

In regard to the message, this is done. But there is more than the message. The output, for example, needs to know when it has last *processed* a message, so that it can apply the various "last processed" time related comparisons. Note that due to the queue, the time a message is processed may greatly differ from the time the message was received. This drives logic like "process only when there are n messages within m seconds" and things like this.

what I'm missing is why anyone would care when the message was stored. I can see all sorts of reasons to do logic based on time+quantity for messages using the time that the messages were generated (or recieved if you don't trust the clocks on the sending servers), but I can't think of what logic someone would want to apply based on when the messages were processed/stored.

the only thing that I can think of that would have even a scrap of validity would be to try and surpress alerts if the message was too old, but even that is very questionable.

David Lang
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Professional Services Information

  • Custom written rsyslog.conf?
  • Maintenance Contract?
  • Installation support?

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 4:06 pm

rgerhards wrote:If you can, it would be good if you pulled data from my git repository at git.adiscon.com/git/rsyslog.git. devel is the master branch. I have done some more enhancements, but nothing of the same magnitude yet. But I expect more patches ;)

unfortunantly I can't use git from behind my company firewall, I'll have to stage this through my home system, but I was glad to see that you are using git, it will make it easier to work with (once I get over the initial setup hassles ;) )
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Tue Sep 16, 2008 4:10 pm

dlang wrote:what I'm missing is why anyone would care when the message was stored. I can see all sorts of reasons to do logic based on time+quantity for messages using the time that the messages were generated (or recieved if you don't trust the clocks on the sending servers), but I can't think of what logic someone would want to apply based on when the messages were processed/stored.

the only thing that I can think of that would have even a scrap of validity would be to try and surpress alerts if the message was too old, but even that is very questionable.


You think too much in terms of storing something on disk ;) This is not needed for the user, the property is not even exposed. We need it e.g. for mark message processing. Mark messages must only be written if no other message has recently been written to the output. So we need to know when the last message was written.

See (line 705) of
http://git.adiscon.com/?p=rsyslog.git;a ... c6;hb=HEAD

f_time was populated when the last message was processed. There are other reasons, but that is probably the one we can not yet get away from.

$ActionExecOnlyOnceEveryInterval is another sample case where we need the time of dequeue. There are others, but as with them, time() is called only when these features are enabled.

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards » Tue Sep 16, 2008 4:12 pm

dlang wrote:
rgerhards wrote:If you can, it would be good if you pulled data from my git repository at git.adiscon.com/git/rsyslog.git. devel is the master branch. I have done some more enhancements, but nothing of the same magnitude yet. But I expect more patches ;)

unfortunantly I can't use git from behind my company firewall, I'll have to stage this through my home system, but I was glad to see that you are using git, it will make it easier to work with (once I get over the initial setup hassles ;) )

http access for git is enabled, if that helps ;)
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards » Tue Sep 16, 2008 4:15 pm

One more update: what is currently driving me crazy is the FUTEX calls. There is a large number with any message and I have to admit I did not yet find the code that causes them. I've traced a lot of things, and so far it seems to boild down to a pthread_cond_signal() [the futex calls are done by pthreads, so there is no one-to-one mapping to what my code does and, as it looks, this involves a lot of guesswork...]. From what I see, these calls take up a considerable amount of time. If you have any suggestion on how to track down what is causing them, I'd appreciate that ;)

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 4:19 pm

rgerhards wrote:If you can, it would be good if you pulled data from my git repository at git.adiscon.com/git/rsyslog.git. devel is the master branch. I have done some more enhancements, but nothing of the same magnitude yet. But I expect more patches ;)


grabbing the snapshot of bcb97650683bbcb1ecdacd0c2a6052ef836d3eda doesn't work as it doesn't include the configure program. I'll unpack it on top of my 3.21.4 image to get it working
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Tue Sep 16, 2008 4:20 pm

dlang wrote:grabbing the snapshot of bcb97650683bbcb1ecdacd0c2a6052ef836d3eda doesn't work as it doesn't include the configure program. I'll unpack it on top of my 3.21.4 image to get it working


You don't need configure.

autoreconf -fvi

builds it for you.
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 4:26 pm

rgerhards wrote:
dlang wrote:what I'm missing is why anyone would care when the message was stored. I can see all sorts of reasons to do logic based on time+quantity for messages using the time that the messages were generated (or recieved if you don't trust the clocks on the sending servers), but I can't think of what logic someone would want to apply based on when the messages were processed/stored.

the only thing that I can think of that would have even a scrap of validity would be to try and surpress alerts if the message was too old, but even that is very questionable.


You think too much in terms of storing something on disk ;) This is not needed for the user, the property is not even exposed. We need it e.g. for mark message processing. Mark messages must only be written if no other message has recently been written to the output. So we need to know when the last message was written.
hmm, I always interpreted this to in such a way that I could do it on time_recieved (including from a local source) rather then time_output. I guess you can argue it either way.

$ActionExecOnlyOnceEveryInterval is another sample case where we need the time of dequeue. There are others, but as with them, time() is called only when these features are enabled.

this one I think could also be argued that you take action based on the recieved time. if it takes you three times as long to process the messages you really don't want three alerts going out when only one would have gone out if you were keeping up with the messages.
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Tue Sep 16, 2008 4:30 pm

I think you can not argue that received time is the correct here. Again, keep in mind that time of reception and time of processing may be hours different (if things have been sitting in a queue so long). Using the reception timestamp creates a lot of confusion. For example, why write a mark when a message was written just a minute ago, but that message was 2 hours old?
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards » Tue Sep 16, 2008 4:38 pm

mmhhh... I have thought a bit more about it. In the later case, I now tend to follow one can argue either way. But with the mark messages? I am not sure... To be honest, I'd like to be able to see it in the way you do, because that would remove the one instance that blocks me from removing the time call...

Any more arguments?
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 4:55 pm

a quick test of the new code is recieving ~88,000 messages/sec the reader thread is at 100% of the cpu and the writer thread is at 99% of the cpu

a quick strace of the threads (non-debug version) is showing that the futex calls are gone from the reader thread, but there are a couple of them in the writer thread.

more detail to follow when I have time to test the debug version.
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Tue Sep 16, 2008 5:09 pm

Oh, that's good news. I am still more than puzzled with the futex calls. I think I'll sleep over them, maybe I have a new idea tomorrow. There are also some other, non-os optimizations, but the OS calls are obviously the most valueable ones.

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 5:37 pm

rgerhards wrote:mmhhh... I have thought a bit more about it. In the later case, I now tend to follow one can argue either way. But with the mark messages? I am not sure... To be honest, I'd like to be able to see it in the way you do, because that would remove the one instance that blocks me from removing the time call...

Any more arguments?


the purpose of the mark message is to indicate that everything is still working, there is just no activity right now.

I agree that if you are lagging in processing the data inserting mark messages can be seen as being redundant, but on the other hand they do indicate that the input side of things was idle for that long, and they are small so it doesn't cost much if you end up generating more then you strictly need to prove that the output side of things is alive.

you are viewing mark as 'do this if there has been no output (including delayed output) in X min'

I am viewing mark as 'do this if there have been no messages logged from any source in X min'

I think your view leads to madness while mine may surprise people initially, but has concrete logic to back it up.

if you have multiple output modules that take different amounts of time to process messages
one that processes them at 50,000/sec (write to disk or relay)
one that processes them at 10,000/sec (write to a database)
you have mark set to the default 20 min
every 30 min you get a 3 min burst of traffic at 50,000/sec

the first output will write for 3 min and be idle for 27 min
the second output will write for 15 min and be idle for 15 min
what should the immark module do? should it generate a mark message at the 23 min mark (the first output has not written anything for 20 min), or should it not write anything (the second output is never idle for more then 15 min)

or if mark is set for 10 min
should it generate a mark message at the 13 and 23 min mark, or at the 25 min mark?, or all of three times?

if a mark message is generated, should it go to through all the normal rules and filters? or does it only go to the filters that would send messages to the particular outputs?

with my viewpoint the first case would generate a mark at the 23 min mark, the second case would generate a mark at 13 min and 23 min, because nothing else generated any log messages in the meantime. You don't have to worry about cases where different output modules take different amounts of time to process the logs, and any mark messages that you generate go through all the standard filters (including ones that may filter out mark messages for some outputs, in which case generating them based on that output is always a waste of time)
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 8:56 pm

dlang wrote:a quick test of the new code is recieving ~88,000 messages/sec the reader thread is at 100% of the cpu and the writer thread is at 99% of the cpu

a quick strace of the threads (non-debug version) is showing that the futex calls are gone from the reader thread, but there are a couple of them in the writer thread.

more detail to follow when I have time to test the debug version.


here is the more detailed strace
Attachments
rsyslog.debug.tar.bz2
strace data
(52.5 KiB) Downloaded 6 times
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang » Tue Sep 16, 2008 8:58 pm

rgerhards wrote:
dlang wrote:grabbing the snapshot of bcb97650683bbcb1ecdacd0c2a6052ef836d3eda doesn't work as it doesn't include the configure program. I'll unpack it on top of my 3.21.4 image to get it working


You don't need configure.

autoreconf -fvi

builds it for you.

I looked at the README, but didn't see this. I ended up unpacking it on top of 3.21.4 and doing a make clean then a configure.
dlang
Frequent Poster
 
Posts: 126
Joined: Mon Sep 15, 2008 7:44 am

Google Ads


PreviousNext

Return to Developer's Corner

Who is online

Users browsing this forum: No registered users and 0 guests

cron