rsyslog performance

This is the place for developers to discuss bugs, new features and everything else about code changes.

Re: rsyslog performance

Postby dlang on Thu Oct 02, 2008 6:38 pm

rgerhards wrote:I have now implemented another idea that should improve performance of imudp by considerably reducing the amount of select() calls. However, it potentially has some drawbacks if the system is being saturated by more than one sender at the same time. From the change comment:

"we now try to read from the file descriptor until there is no more data. This is done in the hope to get better performance out of the system. However, this also means that a descriptor monopolizes processing while it contains data. This can lead to data loss in other descriptors. However, if the system is incapable of handling the workload, we will loss data in any case. So it doesn't really matter where the actual loss occurs - it is always random, because we depend on scheduling order."

I am not sure if it really doesn't matter (feedback appreciated). In any case, please try in your scenario. I think we should see increased throughput (as the writer is capable to handle the load). Even if the solution would not stay, the results tell us if we are heading into the right direction. Note: this is on the perf branch again.

this doesn't seem to be doing anything. when I do a strace on the process it still looks like it's doing a select for every message.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Professional Services Information

  • Custom written rsyslog.conf?
  • Maintenance Contract?
  • Installation support?

Re: rsyslog performance

Postby dlang on Thu Oct 02, 2008 8:01 pm

one longer-term thing to look at is alternate queing approaches.

there is work being done for the linux kernel on a high-performance tracing infrastructure (info at http://lwn.net/Articles/300994/ )
among the things that they are doing as part of this is a very high performance circular buffer that doesn't require locking for inserting or removing messages (it also does multiple buffers that get combined after the fact, but you shouldn't need that extra layer of things)

given that multiple destinations need to see the request I don't know how to tell how well it would work with multiple reader processes. If it's the case that the no-lock approach only works well if there is only one reader process, I'm wondering if the filtering/output side of things shouldn't be slightly re-factored so that each queue has a reader thread assigned to it.

this thread would
read from it's input queue
for each output that this thread loop through
if it goes to another queue, add it to that queue (which would have another thread like this one reading from it)
else filter and format the message then send it to the output routine

depending on the filters involved it may make sense for this thread to do some filtering even if it's sending it to another queue (cheap filters like priority would make sense to do here, expensive ones like regex matches would not)

this is very similar to what you are doing today as I understand it, with a couple (possibly important) differences.

1. today multiple output modules read from the same message queue. under this approach each queue will only ever have one reader.
this may simplify locking/contention issues on reads if you don't have to allow for multiple readers, and it simplifies the queue entry as you don't have to record which outputs have accessed it and which haven't yet.

2. if you have separate queues for different outputs, all the formatting and filtering for that output is going to be done in separate threads from any other output, allowing for the output load to be spread across more CPUs and keeping the overhead of the main message queue loop very light
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang on Thu Oct 02, 2008 8:51 pm

dlang wrote:
rgerhards wrote:I have now implemented another idea that should improve performance of imudp by considerably reducing the amount of select() calls. However, it potentially has some drawbacks if the system is being saturated by more than one sender at the same time. From the change comment:

"we now try to read from the file descriptor until there is no more data. This is done in the hope to get better performance out of the system. However, this also means that a descriptor monopolizes processing while it contains data. This can lead to data loss in other descriptors. However, if the system is incapable of handling the workload, we will loss data in any case. So it doesn't really matter where the actual loss occurs - it is always random, because we depend on scheduling order."

I am not sure if it really doesn't matter (feedback appreciated). In any case, please try in your scenario. I think we should see increased throughput (as the writer is capable to handle the load). Even if the solution would not stay, the results tell us if we are heading into the right direction. Note: this is on the perf branch again.

this doesn't seem to be doing anything. when I do a strace on the process it still looks like it's doing a select for every message.

my mistake, I had botched my local git setup so I wasn't switching branches properly

now that I've done a real test things are looking pretty ugly, throughput is _way_ down, cpu utilization of the threads is also way down (to the 70% range for the listener)

also, the writer thread has picked up a function call to stat /etc/localtime

attached is a small strace of the two active threads.

re: timestamps. I wonder if it would be worth adding an option 'imprecise timestamps' or similar that would limit timestamp resolution to the nearest second and have one thread update a timestamp variable that the other threads used instead of doing a gettimeofday call.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards on Fri Oct 03, 2008 10:00 am

dlang wrote:that's one way to do it, another way would be to spawn a seperate thread for each listening UDP socket. at that point the only point of contention would be on the main queue.


Sure, but this also has drawbacks. Most importantly, you create a large number of threads. For the UDP input, however, this should not be that much of a problem as I think, even in the IPv6&v4 case, there will be hardly more than 10 threads running at once. So it may make sense to switch the UDP input's threading model to something different from that of the others. However, this probably also implies we need to do some modifications to the interface.

For the time being, I think it is acceptable as it now is. Do you concur (except for the performance issue you raised and which I will reply to later)?

dlang wrote:I was going to ask how tcp input modules handle multiple connections, but from this comment it sounds like you probably do something along the following lines

check to see if there are new connections, if so accept a new connection (creating a new FD) and add it to an array
check to see if there is new data on any connection, if so
switch local context to that FD
read the data from that FD and process it
loop back to the top

am I close to correct?


Yes - that's the model for almost all inputs. The point here is that I do no want to create thousands of threads, because this does not work out. The current code probably does not handle thousands of connections either, but it could easily be improved to do so. The idea is that we have a couple of input threads, each one serving a number of selectors via select (or epoll if performance becomes an issue).
User avatar
rgerhards
Site Admin
 
Posts: 1780
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Fri Oct 03, 2008 10:07 am

dlang wrote:my mistake, I had botched my local git setup so I wasn't switching branches properly

now that I've done a real test things are looking pretty ugly, throughput is _way_ down, cpu utilization of the threads is also way down (to the 70% range for the listener)

also, the writer thread has picked up a function call to stat /etc/localtime

attached is a small strace of the two active threads.


Without yet looking at the trace, I think you actually didn't use the new code for a couple of days. So we need to re-evalute all the changes we did in the last week or so. A good indication is the stat to /etc/localtime. As far as I have seen, this comes from mktime(), which I needed to re-create the timestamp on the ouput side. But that was done 5 days or so ago. Thus I conclude that we must have tested with different code.

I can hardly believe the new recvfrom() loop slows things down, because it reduces the number of system calls and the code pathes. Anyhow, this is simple to verify. Probably we should start there and then see where this heads us to. What do you think?

dlang wrote:re: timestamps. I wonder if it would be worth adding an option 'imprecise timestamps' or similar that would limit timestamp resolution to the nearest second and have one thread update a timestamp variable that the other threads used instead of doing a gettimeofday call.


I have to admit I do not like this idea as it requires us to continously run a thread we do not know that we always need. I thought about another optimization. The new recvfrom() do loop is very quickly executed. I think about adding a switch that, as long as the code is inside that loop, a timestamp is obtained only every n-th packet received and being reused for the others. If n is set to 10 and the system is very busy, the timestamp should not be much less inacurate, but we save 90% of the time calls. Even if set to 2, we already save 50% at the expense of almost no loss in precision.
User avatar
rgerhards
Site Admin
 
Posts: 1780
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Fri Oct 03, 2008 10:24 am

dlang wrote:no. I think I understand the general flow of things (what different software routines exist), but what I am missing is where the thread boundries are.

hmm, looking at imtemplate it seems to be saying that the input module spins in a tight loop getting data and adding it to the main queue.

that would imply that it's the thread that runs the output module that does all the filtering, formatting, and output.

am I right so far?


We are getting closer. Look at the picture again. On the middle left, there is a queue. This is the thread boundary. Inputs submit to the queue, then their part of processing is finished. As part of this submission process, worker threads are started on an as-needed basis. These workers reside on the output side of the queue. The main message queue workers do everything needed to filter and format the message except than doing the actual output processing. Once an output is to be called, the main msgq workers submit the work item the the action queue in question. In the picture, these are the queues symbols to the middle right. Here the same happens: during submission, workers are activated on an as-needed basis. Again, they reside on the other side of the queue and consume queue content and call the output.

I am omitting the information about DIRECT queue mode from this picture, because I think this is what confuses you. Let's get the simplified view right, than we can look at that (important) detail.

In that simplified view, your very basic setup (one input, one output) has:

a) 1 input thread
b) 1 main message queue worker thread
c) 1 threat driving the output

a) accepts the udp messages, parses them and submits them to the main msgq
b) pulls msgs from the main msgq, filters submits those that apply to the output queue
c) pulls msgs from the output q and submits them to the output

Again, let's forget for a moment about DIRECT queue mode. Does the description so far make sense to you?
User avatar
rgerhards
Site Admin
 
Posts: 1780
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Fri Oct 03, 2008 10:25 am

dlang wrote:I left the process running, how do I get a stack trace?


Can you send it a sigabort and look at the core dump via gdb?
User avatar
rgerhards
Site Admin
 
Posts: 1780
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Fri Oct 03, 2008 10:41 am

dlang wrote:one longer-term thing to look at is alternate queing approaches.

there is work being done for the linux kernel on a high-performance tracing infrastructure (info at http://lwn.net/Articles/300994/ )
among the things that they are doing as part of this is a very high performance circular buffer that doesn't require locking for inserting or removing messages (it also does multiple buffers that get combined after the fact, but you shouldn't need that extra layer of things)


I am very skeptic for moving into this direction. First of all, it is obviously very platform-specific, so at least two different code pathes would need to be maintained. But, even more importantly, I don't think that would bring so much benefit. The queue is much more than just a circular memory buffer. The queue object does all multithreading, schedules flow of messages through the various parts of the system and also maintains persistence across sessions. Not to mention extended disk-buffering or the ultra-reliable disk queue mode. The actual in-memory circular buffer part is minimalistic. For example, the linkedlist queue driver is around 100 lines of code. It may be possible, though, to utilize this kernel work in the form of a new queue driver, which could provide lock-free enqueuing and dequeueing. But than we would need the number of workers with this approach to a maximum of 1, because, as you say, it may not be possible that multiple workers dequeue. I am not sure if that is desirable.

dlang wrote:this thread would
read from it's input queue
for each output that this thread loop through
if it goes to another queue, add it to that queue (which would have another thread like this one reading from it)
else filter and format the message then send it to the output routine


hehe - we are getting closer ;) This is very close to what is actually done today! It's done in a more abstracted way by utilizing DIRECT queue mode where appropriate, but from a point of "what" happens it is correct. It is even mostly correct from the "how it happens" point, at least if you tear apart object encapsulation.
dlang wrote:depending on the filters involved it may make sense for this thread to do some filtering even if it's sending it to another queue (cheap filters like priority would make sense to do here, expensive ones like regex matches would not)

this is very similar to what you are doing today as I understand it, with a couple (possibly important) differences.

1. today multiple output modules read from the same message queue. under this approach each queue will only ever have one reader.

Nope! not different outputs read the same queue, but different workers. The number of workers is a configuration limit, so one could limit them to 1.
dlang wrote: this may simplify locking/contention issues on reads if you don't have to allow for multiple readers, and it simplifies the queue entry as you don't have to record which outputs have accessed it and which haven't yet.

All of this does NOT happen today, because we do not need it ;)

dlang wrote:2. if you have separate queues for different outputs, all the formatting and filtering for that output is going to be done in separate threads from any other output, allowing for the output load to be spread across more CPUs and keeping the overhead of the main message queue loop very light

I mostly agree, and this is how it works today ;) I do not agree on the filtering, as filtering decides which output is needed. This is done in the main message queue. If not, you would submit messages to the output queue that the output does not need - something that in a typical configuration frequently happens. So the overall time to process would increase if you unnecessarily submit messages to the output queue. The main message queue uses multiple workers to do the decision-making in parallel, so that the main message queue does do this decision making does not limit parallel processing.
User avatar
rgerhards
Site Admin
 
Posts: 1780
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang on Fri Oct 03, 2008 2:30 pm

rgerhards wrote:
dlang wrote:that's one way to do it, another way would be to spawn a seperate thread for each listening UDP socket. at that point the only point of contention would be on the main queue.


Sure, but this also has drawbacks. Most importantly, you create a large number of threads. For the UDP input, however, this should not be that much of a problem as I think, even in the IPv6&v4 case, there will be hardly more than 10 threads running at once. So it may make sense to switch the UDP input's threading model to something different from that of the others. However, this probably also implies we need to do some modifications to the interface.

For the time being, I think it is acceptable as it now is. Do you concur (except for the performance issue you raised and which I will reply to later)?
since I don't expect to use multiple udp sockets it's not going to affect me (when I move to IPv6 on these systems I won't need to support ipv4), and I suspect that for syslog (which is an internal tool) most environments are going to be the same way. so in practice I don't think it's going to matter much.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang on Fri Oct 03, 2008 2:49 pm

rgerhards wrote:
dlang wrote:my mistake, I had botched my local git setup so I wasn't switching branches properly

now that I've done a real test things are looking pretty ugly, throughput is _way_ down, cpu utilization of the threads is also way down (to the 70% range for the listener)

also, the writer thread has picked up a function call to stat /etc/localtime

attached is a small strace of the two active threads.


Without yet looking at the trace, I think you actually didn't use the new code for a couple of days. So we need to re-evalute all the changes we did in the last week or so. A good indication is the stat to /etc/localtime. As far as I have seen, this comes from mktime(), which I needed to re-create the timestamp on the ouput side. But that was done 5 days or so ago. Thus I conclude that we must have tested with different code.
I've only had git running on my side for a couple days, so the problems can't go back befor that.
I figured out what I did wrong. when I did my first checkout of the branches from upstream I did so with a -b, which created a local branch with the same name as upstream, after that I would not have seen any changes to that branch upstream.

so looking back through my e-mail, tuesday night is when I did this, so my tests tuesday and wednesday on helgrind were from that codebase. I made the same mistake on perf, but it happened a little later (immediatly after you merged helgrind into it, I had checked the git log to see if it looked like I was on the right branch and remember seeing that merge
I can hardly believe the new recvfrom() loop slows things down, because it reduces the number of system calls and the code pathes. Anyhow, this is simple to verify. Probably we should start there and then see where this heads us to. What do you think?
I don't thing the new receive loop is slower, I think that there is lock contention that is preventing it from running, which is why it's only using 70% of the CPU.. I'll have to do the math, but I don't think the performance/CPU has dropped much (if at all), and just dong the additional futex calls will eat some of it, so I think it is more efficiant, just not being able to run as much.

dlang wrote:re: timestamps. I wonder if it would be worth adding an option 'imprecise timestamps' or similar that would limit timestamp resolution to the nearest second and have one thread update a timestamp variable that the other threads used instead of doing a gettimeofday call.


I have to admit I do not like this idea as it requires us to continously run a thread we do not know that we always need. I thought about another optimization. The new recvfrom() do loop is very quickly executed. I think about adding a switch that, as long as the code is inside that loop, a timestamp is obtained only every n-th packet received and being reused for the others. If n is set to 10 and the system is very busy, the timestamp should not be much less inacurate, but we save 90% of the time calls. Even if set to 2, we already save 50% at the expense of almost no loss in precision.
That can work. since that would only kick in while you have a lot of messages queued, and even with two gettimeofday calls per syscall (in a systrace) handling one loop takes around 0.00015 seconds, the loss of precision should not matter to anyone. you may want to make this a tunable so that people who care can set it to 1, but default it to 10 or 100.

the advantage of the seperate thread is that it would be able to eliminate gettimeofday calls everywhere, including in output modules. I agree that adding a seperate thread is not a pretty thing to do, but it's doing almost nothing (a select sleep for just under a second, followed by a gettimeofday call and updating one variable). If another thread is still a concern could you do this in the otherwise idle main thread instead of a seperate thread?

David Lang
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang on Fri Oct 03, 2008 3:01 pm

I am omitting the information about DIRECT queue mode from this picture, because I think this is what confuses you. Let's get the simplified view right, than we can look at that (important) detail.

In that simplified view, your very basic setup (one input, one output) has:

a) 1 input thread
b) 1 main message queue worker thread
c) 1 threat driving the output

a) accepts the udp messages, parses them and submits them to the main msgq
b) pulls msgs from the main msgq, filters submits those that apply to the output queue
c) pulls msgs from the output q and submits them to the output

Again, let's forget for a moment about DIRECT queue mode. Does the description so far make sense to you?


Yes, this makes sense.

if you have multiple inputs they all add the messages to the input queue independently.
before we talk about direct mode, if you have multiple outputs, which queues multiply?
say you were writing to both MySQL and Postgres, would you just have two 'c' threads? or would you have multiple 'b' threads as well?

I see some potential issues either way.

if you have one 'b' thread it could be a limitation because the filters could be expensive (as I noted in another e-mail, some filters would make sense here, some should possibly be pushed to the output threads)

if you have multiple 'b' threads the main msgq gets more complicated (you have more locking to do as you have multiple readers, and you need to record when each reader has processed the message (and check if the thread running is the last reader) so that you can tell when all outputs have read the message and it can be removed from the queue.

I'm actually going to start testing this area after we finish finding the lockup. I have one point where rsyslog will be used to receive messages and then split them up between many difference instances of SEC for some alerting analysis.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang on Fri Oct 03, 2008 3:02 pm

rgerhards wrote:
dlang wrote:I left the process running, how do I get a stack trace?


Can you send it a sigabort and look at the core dump via gdb?

I will do so when I get into the office.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang on Fri Oct 03, 2008 3:29 pm

rgerhards wrote:
dlang wrote:one longer-term thing to look at is alternate queing approaches.

there is work being done for the linux kernel on a high-performance tracing infrastructure (info at http://lwn.net/Articles/300994/ )
among the things that they are doing as part of this is a very high performance circular buffer that doesn't require locking for inserting or removing messages (it also does multiple buffers that get combined after the fact, but you shouldn't need that extra layer of things)


I am very skeptic for moving into this direction. First of all, it is obviously very platform-specific, so at least two different code pathes would need to be maintained. But, even more importantly, I don't think that would bring so much benefit. The queue is much more than just a circular memory buffer. The queue object does all multithreading, schedules flow of messages through the various parts of the system and also maintains persistence across sessions. Not to mention extended disk-buffering or the ultra-reliable disk queue mode. The actual in-memory circular buffer part is minimalistic. For example, the linkedlist queue driver is around 100 lines of code. It may be possible, though, to utilize this kernel work in the form of a new queue driver, which could provide lock-free enqueuing and dequeueing. But than we would need the number of workers with this approach to a maximum of 1, because, as you say, it may not be possible that multiple workers dequeue. I am not sure if that is desirable.

the intent was not that you would use this feature on linus (it's an in-kernel tool, I don't think userspace would have access to it anyway), but rather as code that would be used for a new queue driver (possibly replacing some existing queue drivers if it's enough better).

dlang wrote:this thread would
read from it's input queue
for each output that this thread loop through
if it goes to another queue, add it to that queue (which would have another thread like this one reading from it)
else filter and format the message then send it to the output routine


hehe - we are getting closer ;) This is very close to what is actually done today! It's done in a more abstracted way by utilizing DIRECT queue mode where appropriate, but from a point of "what" happens it is correct. It is even mostly correct from the "how it happens" point, at least if you tear apart object encapsulation.
dlang wrote:depending on the filters involved it may make sense for this thread to do some filtering even if it's sending it to another queue (cheap filters like priority would make sense to do here, expensive ones like regex matches would not)

this is very similar to what you are doing today as I understand it, with a couple (possibly important) differences.

1. today multiple output modules read from the same message queue. under this approach each queue will only ever have one reader.

Nope! not different outputs read the same queue, but different workers. The number of workers is a configuration limit, so one could limit them to 1.
dlang wrote: this may simplify locking/contention issues on reads if you don't have to allow for multiple readers, and it simplifies the queue entry as you don't have to record which outputs have accessed it and which haven't yet.

All of this does NOT happen today, because we do not need it ;)
you say this isn't happening today, if you have multiple workers, how does this work?

A. worker 1 pulls the message from the queue and delivers it to all possible outputs, worker 2 pulls the next message from the queue and delivers it to all possible outputs

B. worker 1 pulls the message from the queue and delivers it to one/some outputs, worker 2 pulls the same message from the queue and delivers it to other outputs.

C. other?

if you do A this can re-order messages (either through OS scheduling, or by the filters for worker 1 taking significantly longer than the fiters for worker 2), this can get ugly when something outputs multi-line messages.

if you do B then you have to keep track of whether or not all the outputs have gotten a copy of this message yet, if they haven't then the message needs to stay in the queue, if they have then the message needs to be removed from the queue.
dlang wrote:2. if you have separate queues for different outputs, all the formatting and filtering for that output is going to be done in separate threads from any other output, allowing for the output load to be spread across more CPUs and keeping the overhead of the main message queue loop very light

I mostly agree, and this is how it works today ;) I do not agree on the filtering, as filtering decides which output is needed. This is done in the main message queue. If not, you would submit messages to the output queue that the output does not need - something that in a typical configuration frequently happens. So the overall time to process would increase if you unnecessarily submit messages to the output queue. The main message queue uses multiple workers to do the decision-making in parallel, so that the main message queue does do this decision making does not limit parallel processing.


I thought the formatting was done by the queue worker thread as well as the filtering (based on the discussion about how multiple messages could possibly be dispatched togeather)

I agree that doing the filtering in the output queue would eat more CPU overall, but the thought is that not all filters would need to be applied for each output, so the total amount of filtering work that would need to be done by any one thread would be lower.

for 'cheap' filtering (priority, facility, system or program name matches) this isn't a significant gain, and the filtering can be done in the queue worker thread without running any significant risk of that being the bottleneck, but for more complex filters (for example regex filters), it's very possible for there to be enough filters that the queue worker thread becomes cpu bound and limits the system throughput.

David Lang
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards on Fri Oct 03, 2008 3:54 pm

David,

first of all, I think I have found the source of misunderstanding. I think you assume that a worker thread is the same as an output thread. It is not. A worker thread is one that performs some action (which is defined by the context in which the worker is used). An output "thread" is just one potential context in which a worker thread can execute. But there is no one-to-one mapping between them. What now comes may again confuse, so please take my word for it at the time being. We'll see how this works when we dig down into direct queue mode. So: an output does not even necessarily need a worker thread.

It is vitally important to think of worker threads and outputs as totally different entities.

dlang wrote:if you have multiple inputs they all add the messages to the input queue independently.
before we talk about direct mode, if you have multiple outputs, which queues multiply?
say you were writing to both MySQL and Postgres, would you just have two 'c' threads? or would you have multiple 'b' threads as well?


You would just have two 'c' threads.

dlang wrote:if you have one 'b' thread it could be a limitation because the filters could be expensive (as I noted in another e-mail, some filters would make sense here, some should possibly be pushed to the output threads)


I simplified my description. There is not necessarily a single worker. This is a worker pool. It runs many workers, how many and when they are spawned can be configured. This has NOTHING to do with how many outputs you have. Running multiple b-workers can also be useful if you have complex filters and only a single output.

dlang wrote:if you have multiple 'b' threads the main msgq gets more complicated (you have more locking to do as you have multiple readers, and you need to record when each reader has processed the message (and check if the thread running is the last reader) so that you can tell when all outputs have read the message and it can be removed from the queue.


No, because the worker ensures that the message is processed. As soon as a worker pulls a message from the queue, it can be deleted (don't let us get into details at this level, we need the broad picture first).
User avatar
rgerhards
Site Admin
 
Posts: 1780
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards on Fri Oct 03, 2008 4:08 pm

dlang wrote:A. worker 1 pulls the message from the queue and delivers it to all possible outputs, worker 2 pulls the next message from the queue and delivers it to all possible outputs

B. worker 1 pulls the message from the queue and delivers it to one/some outputs, worker 2 pulls the same message from the queue and delivers it to other outputs.

C. other?

if you do A this can re-order messages (either through OS scheduling, or by the filters for worker 1 taking significantly longer than the fiters for worker 2), this can get ugly when something outputs multi-line messages.

if you do B then you have to keep track of whether or not all the outputs have gotten a copy of this message yet, if they haven't then the message needs to stay in the queue, if they have then the message needs to be removed from the queue.


On re-ordering: as soon as you run things asynchronous, messages can be re-ordered. There is no way around this. If order of events is important, you need to limit the main message queue to a single worker thread and do NOT run the outputs in any asynchronous mode.

On multi-line messages: there is no such beast in syslog. However, depending on the transport and framing used, rsyslog supports embedded LF's inside the message and treats them as single messages. A multi-line message in the sense of multiple messages being a single one does also cause potential problem on the UDP layer. UDP does not guarantee order of delivery.
User avatar
rgerhards
Site Admin
 
Posts: 1780
Joined: Thu Feb 13, 2003 11:57 am

Google Ads


PreviousNext

Return to Developer's Corner

Who is online

Users browsing this forum: No registered users and 0 guests

cron