rsyslog performance

This is the place for developers to discuss bugs, new features and everything else about code changes.

Re: rsyslog performance

Postby rgerhards » Thu Oct 09, 2008 8:50 am

one more thanks ;) So that error told us the previous problems were rooted in a version difference. Some incompatible version was loaded (and, yes, I should have incremented the version flag when immediately when I did the change ;). I'll also add more diagnostics to make sure such things are easier to spot (if I fail again for some reason).

Once the correct versions of all modules are installed, we should be able to get more consistent status, and I hope that than things will work for you as they do in my lab. I also think that some of the previous trouble were also caused by partial install of the newest versions. This all makes very much sense. The root problem than was that I changed Interface details in the perf branch but did not yet increment the IF versions (as I should have, because this is what guards us against situations like the one we experienced...).
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Professional Services Information

  • Custom written rsyslog.conf?
  • Maintenance Contract?
  • Installation support?

Re: rsyslog performance

Postby dlang » Thu Oct 09, 2008 9:24 am

dlang wrote:
rgerhards wrote:Logically, each instances has three threads
The parsing was in a) and is now being moved to b), which is exactly the right place (because b is multi-threaded with potentially multiple workers).

I suspect that if different input file descripters could be split into different threads, you could get sufficiant threads working on the input to see additional gains. but even if that wasn't the case, if the contention could be settled 80,000 logs/sec is not at all shabby

I think better way of stating my concern is that the orginizational and modular advantages of makeing the main queue only handle pre-parsed messages outweighs the performance advantage of moving the work to the worker threads
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Thu Oct 09, 2008 9:29 am

dlang wrote:I think better way of stating my concern is that the orginizational and modular advantages of makeing the main queue only handle pre-parsed messages outweighs the performance advantage of moving the work to the worker threads


Well, actually there are no such advantages. All that is needed to shuffle the parsing to the "right" spot is a single if statment plus one additional variable. See lines 1196 to 1198 in this code:

http://git.adiscon.com/?p=rsyslog.git;a ... heads/perf

That's all, plus a little bit interface change. Actually, as I said, the design is to move everything out of the input thread that does no necessarily belong there. So far, I was just too lazy to do this to the parsers. If you look at the code, you'll see that parsers have been mostly untouched since 1.x and were still due for re-structuring and modularization. At least as part of that effort they would have been moved.

As a side-note, I already found what made perf not print any more logs. It was a side-effect of this morning's change ;) Now working again on testing and verification.

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards » Thu Oct 09, 2008 10:16 am

One piece of information I am now also verifying in my lab. I have done lot's of threading programming on Windows before I turned back to Linux. A subtle, yet important, difference is that under Windows, when a mutex is released, the scheduler immediately kicks in and evaluates if it is time for a thread switch. Under Linux, there seems to be no such integration. I don't want to argue what is better, but in situations like we have in rsyslog, the Windows approach works better.

What I see in the ogs under Linux is that the input thread (or queue worker, whichever you prefer) does not get time allocated even when the mutex is freed until the time slice is used up. But then, the other thread may be waiting on the mutex. I now experimented a bit, emulating Windows behaviour. Actually, when I commented out the pthread_yield()s, performance got worse. Experimentally, I have added a phtread_yield() after the mutex release when enqueing. And, voila, I see more work done in parallel and an overall increase in throughput. I think I leave this experimental pthread_yield() in for the time being, waiting for what you can confirm or not.

I will try a bit more to investigate on lock-free alternatives to the current algorithm, but this may need to wait until a later major release. As I wrote, there is much more done in the critical sections than "just" modifying a few memory pointers.

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Thu Oct 09, 2008 10:57 pm

rgerhards wrote:one more thanks ;) So that error told us the previous problems were rooted in a version difference. Some incompatible version was loaded (and, yes, I should have incremented the version flag when immediately when I did the change ;). I'll also add more diagnostics to make sure such things are easier to spot (if I fail again for some reason).

Once the correct versions of all modules are installed, we should be able to get more consistent status, and I hope that than things will work for you as they do in my lab. I also think that some of the previous trouble were also caused by partial install of the newest versions. This all makes very much sense. The root problem than was that I changed Interface details in the perf branch but did not yet increment the IF versions (as I should have, because this is what guards us against situations like the one we experienced...).


just to be clear, as far as I can tell the last tests were done with all the correct modules.
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Fri Oct 10, 2008 8:20 am

I begin to collect information on lock-free algorithms at http://wiki.rsyslog.com/index.php/Lock-Free_Main_Queue
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards » Fri Oct 10, 2008 11:21 am

FYI: while gathering the information, it appeared to me that there are some situations where it is safe to enqueue messages without locking. I have modified the algorithm so that it enables lock-free enqueue, when it is safe to do. One criteria is that there must be at least 100 messages in the queue, so the initial enqueues are not lock-free. But once the system gets busy they are. The queue must NOT be disk-enabled at this point (I can enhance the algorithm to support this, but have not done so yet). Also, the current implementation does NOT correctly start up all worker threads. Again, this can be done, but I would like to keep it simple.

In my (low-end) test environment, I did NOT see more work done in parallel. However, I have checked on the mutex calls and the enqueue does not do any. So I think/hope the missing concurrent operation is just due to my test environment. I would be really interested so see how it works on good equipment :)

Beware that the code may be racy, I made serious changes to the engine and have NOT done any testing beyond the simple cases. For anything complex, it may run into trouble (but let's first see if there is any benefit in what I did).

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards » Fri Oct 10, 2008 5:52 pm

OK, it *is* racy. Everthing will work fine as long as it receives lots of traffic, but it will segfault when it flushes buffers after traffic has stopped. I thankfully was able to obtain some better hardware, nothing high-end, but fast enough to allow me to see this issue. Anyhow, I would still be interested what the performance is and if it has improved on your high-end machine. Just ignore the final segfault, I'll look over that next week ;)

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Tue Oct 14, 2008 6:08 pm

significant progress

I figured out what was different between my two boxes (the one that works and the one that goes into a tight loop when it receives a single message)

IPv6

the box that works has no IPv6 support enabled in the kernel
the box that goes into a loop has IPv6 compiled in, but no IPv6 addresses (other then loopback)

if I give it the -4 command line option it runs

the latest version is _very_ cpu efficient on receive, it is now receiving ~60,000 messages/sec with <45% cpu (previously this level of receiving was taking ~70% cpu), when I set the UDPServerTime to a higher value I saw this down around 20% at one point.

there are still a significant number of futex calls (up to three per log message) on both the reading and writing side of things. straces attached. Interestingly, the number of futex messages was significantly different when dumping to the screen vs dumping to a file. I guess this is the overhead of strace kicking in.

I'll have to look into getting OProfile enabled in the kernel and see if that has less overhead.
Attachments
rsyslog-log.zip
strace and top samples
(11.35 KiB) Downloaded 1 time
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Tue Oct 14, 2008 8:21 pm

just a quick question: so we are still down to 60,000 msgs/s while we were at > 80,000? The later version was much less optimized, so it would eventually make sense to go back to that, even though it requires more CPU.

Also: may it be that the system is so fast that no data is queued up at all (I did not yet look at the strace)? I ask because the lock-free enqueing kicks only in if there are at least 100 messages inside the queue (this is a safety margin). Quick test: does your version abort when the messages stop coming in? If not, it probably does not enter lock-free mode (I guess the bug that causes the abort seems to be triggered by lock-free mode).

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Tue Oct 14, 2008 8:44 pm

rgerhards wrote:just a quick question: so we are still down to 60,000 msgs/s while we were at > 80,000? The later version was much less optimized, so it would eventually make sense to go back to that, even though it requires more CPU.

Yes, we are topping out at ~60,000 messages/sec
Also: may it be that the system is so fast that no data is queued up at all (I did not yet look at the strace)? I ask because the lock-free enqueing kicks only in if there are at least 100 messages inside the queue (this is a safety margin). Quick test: does your version abort when the messages stop coming in? If not, it probably does not enter lock-free mode (I guess the bug that causes the abort seems to be triggered by lock-free mode).
It does not abort when the traffic stops.

so this is possible, the output thread is sitting at ~60% cpu, so it's not maxing out.

is there a good way to force it to queue things up?

I could let it fill the disk (~120G) at which point it will stall (and should queue) and then delete things
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby dlang » Wed Oct 15, 2008 3:51 am

I did let it fill the disk and then freed some space

cpu utilization went down to ~12% for the reader but the log rate didn't get much over 63,000/sec

more testing tomorrow (especially if you can give me a better way to force the queuing)
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog performance

Postby rgerhards » Wed Oct 15, 2008 6:15 am

Sorry I didn't reply yesterday (I was then on a cell phone...), but I have thought about forcing lock-free mode. I think I can actually initiate it as long as we are a safety margin below queue full AND we use an array-based queue. In that mode, there is no data item that is accessed both when en- and dequeing. The code will probably still be racy, but at least we should get an idea while there is traffic coming in (which in turn provides us with an idea if it is worth to put further effort into such an approach). I'll now do some testing and let you know when I am done.
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby rgerhards » Wed Oct 15, 2008 7:10 am

OK, I did a very very experimental patch. This makes rsyslogd very vulnerable, it may even abort during normal operations. Anyhow, I would be interested to see if it is faster (when it works). Also, I noticed that on my test system the root problem seemed to be that the in-memory queue fills up very quickly, so the reader processes needs to wait for the queue to drain. I had one CPU at 100% (the reader) while the queue was non-full and one at around 40% (the writer). As soon as the queue was full, the reader went down to 10 to 20% CPU. However, such scenario is expected, as there is nothing we can do against a full queue (except, of course, increasing the rate at which the output is able to work). So to get a real idea of how fast it can accept messages, you need to set a very high $MainMsgQueueSize limit. Maybe this was also the issue with the previous test. Note that once the queue gets full, the code can no longer operate in lock-free mode, as some administrative data items are accessed by both reader and writer.

So if you have not done so, you should probably increase the queue size first and only apply the patch after that.

Here is the patch:
Code: Select all
diff --git a/runtime/queue.c b/runtime/queue.c
index edc36fb..c17e72c 100644
--- a/runtime/queue.c
+++ b/runtime/queue.c
@@ -2169,6 +2169,7 @@ static inline rsRetVal
queueEnqObjLockFree(queue_t *pThis, void *pUsr)
{
   DEFiRet;
+static int iDirty = 0;

   ISOBJ_TYPE_assert(pThis, queue);

@@ -2177,6 +2178,18 @@ queueEnqObjLockFree(queue_t *pThis, void *pUsr)
finalize_it:
   // TODO: questionable if we need the advise
   /* make sure at least one worker is running. */
+/* The problem is that now, without holding the mutex, the worker may still be
+ * busy and as such may not be awaken. If it then goes to sleep after we start,
+ * it will probably do so eternally. As a dirty "fix", we awake it for the first
+ * 100 records. Note, however, that this limits concurrency as we now need to
+ * aquire a mutex. CODE FOR TESTING ONLY! -- rgerhards, 2008-10-15
+ */
+RUNLOG_VAR("%d", iDirty);
+dbgprintf("iDirty: %d\n", iDirty);
+   if(iDirty++ % 100 == 0) {
+      queueAdviseMaxWorkers(pThis);
+dbgprintf("done real advise\n");
+   }
//   queueAdviseMaxWorkers(pThis);
// TODO: check on the status of queueAdviseMaxWorkers. As it is currently,
// we do NOT start more worker threads, which is a bug. But for
@@ -2262,7 +2275,8 @@ queueEnqObj(queue_t *pThis, flowControl_t flowCtlType, void *pUsr)
// upper margin should go lower).
dbgprintf("bIsDA: %d, qType: %d, iQueueSize: %d, iMaxQueueSize: %d\n", pThis->bIsDA, pThis->qType, pThis->iQueueSize,pThis->iMaxQueueSize);
      if(!pThis->bIsDA && pThis->qType != QUEUETYPE_DISK &&
-         pThis->iQueueSize > QLOCKFREEMARGIN  && pThis->iQueueSize < (pThis->iMaxQueueSize - QLOCKFREEMARGIN)) {
+         pThis->iQueueSize < (pThis->iMaxQueueSize - QLOCKFREEMARGIN)) {
+         //pThis->iQueueSize > QLOCKFREEMARGIN  && pThis->iQueueSize < (pThis->iMaxQueueSize - QLOCKFREEMARGIN)) {
         CHKiRet(queueEnqObjLockFree(pThis, pUsr));
      } else {
         CHKiRet(queueEnqObjWithLock(pThis, pUsr));
User avatar
rgerhards
Site Admin
 
Posts: 2196
Joined: Thu Feb 13, 2003 11:57 am

Re: rsyslog performance

Postby dlang » Thu Oct 16, 2008 6:10 am

WOW, I set the queue to 1,000,000 and while it was filling up it was accepting messages at a rate of >280,000 messages/sec _without_ disabling name lookups

after it filled up it slowed down to ~65,000/sec

if I throttle back the sending to ~160,000/sec it keeps up for ~15 seconds before dropping behind (each min when the HUP signals a restart, it appears to also throw away the queue)

I increased the queue size to 5,000,000 and it records ~40,000 messages in the first second, and about 160,000 messages/sec for the next 32 seconds, but while writing the 34th second of logs it hits the beginning of the next min and the HUP restarts it.

so I need to tell it to use multiple threads to write the messages
dlang
Frequent Poster
 
Posts: 125
Joined: Mon Sep 15, 2008 7:44 am

Google Ads


PreviousNext

Return to Developer's Corner

Who is online

Users browsing this forum: No registered users and 0 guests

cron