100% CPU utilization

General discussions here

Moderator: rgerhards

Google Ads


100% CPU utilization

Postby basquiat » Wed Feb 17, 2010 4:15 pm

Dear all,

I'm building a syslog server with rsyslog and MySQL (Ver 14.12 Distrib 5.0.51a) on Debian Lenny, 64bit. As I want to use "$InputUDPServerBindRuleset", I tried rsyslog versions >5.3.x. The behaviour below was reproducable with 5.3.7, 5.5.1 and 5.5.2.

After a few minutes of operation, rsyslog consumes all available CPU time and starts spitting out the following repeatedly in it's debug log:

Code: Select all
[...]
9203.680913294:40800950: Action 0x676c40 transitioned to state: rtry
9203.680932291:40800950: action call returned -2007
9203.680949542:40800950: tryDoAction: unexpected error code -2007, finalizing
9203.683587992:40800950: Action 0x676c40 transitioned to state: rdy
9203.683608106:40800950: Action 0x676c40 transitioned to state: itx
9203.683625217:40800950: entering actionCalldoAction(), state: itx
9203.683645541:40800950:
9203.686525920:40800950: mysql, DBError(silent): db error (1054): Unknown column 'invld' in 'field list'
[...]


My rsyslog configuration looks as follows, leaving commented lines out:

Code: Select all
$ModLoad imuxsock
$ModLoad imklog
$ModLoad imudp
$ModLoad ommysql

$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

$FileOwner root
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022

auth,authpriv.*                 /var/log/auth.log
*.*;auth,authpriv.none          -/var/log/syslog
daemon.*                        -/var/log/daemon.log
kern.*                          -/var/log/kern.log
lpr.*                           -/var/log/lpr.log
user.*                          -/var/log/user.log

mail.*                          -/var/log/mail.log
mail.info                       -/var/log/mail.info
mail.warn                       -/var/log/mail.warn
mail.err                        /var/log/mail.err

news.crit                       /var/log/news/news.crit
news.err                        /var/log/news/news.err
news.notice                     -/var/log/news/news.notice

*.=debug;\
        auth,authpriv.none;\
        news.none;mail.none     -/var/log/debug
*.=info;*.=notice;*.=warn;\
        auth,authpriv.none;\
        cron,daemon.none;\
        mail,news.none          -/var/log/messages

$template dbFormat,"insert into SystemEvents (Message, Facility, FromHost, Priority, DeviceReportedTime, ReceivedAt, InfoUnitID, SysLogTag, ProcessID) values ('%msg%', %syslogfacility%, '%HOSTNAME%',%syslogpriority%, '%timereported:::date-mysql%', '%timegenerated:::date-mysql%', %iut%, '%programname%', '%procid%')",sql

*.emerg                         *

$RuleSet RemoteLog
*.*;mail.none;local1.none;local2.none   :ommysql:localhost,Syslog,username,supersecretpw;dbFormat
mail.*                                  :ommysql:localhost,Maillog,username,supersecretpw;dbFormat
local1.*                                :ommysql:localhost,Weblog,username,supersecretpw;dbFormat
local2.*                                :ommysql:localhost,WinEvent,username,supersecretpw;dbFormat

$InputUDPServerBindRuleset RemoteLog
$UDPServerRun 514


The last messages logged before going into "eat my CPU" mode (invalid IP adresses etc. due to manual log anonymising here):

Code: Select all
8918.950357811:40800950: Action 0x677370 transitioned to state: rdy
8918.950378275:40800950: action call returned 0
8918.950397900:40800950: ruleset: get iRet 0 from rule.ProcessMsg()
8918.950416548:40800950: ruleset: get iRet 0 from rule.ProcessMsg()
8918.950435055:40800950: ruleset: get iRet 0 from rule.ProcessMsg()
8918.950453284:40800950: ruleset.ProcessMsg() returns 0
8918.950472351:40800950: regular consumer finished, iret=0, szlog 0 sz phys 1
8918.950491906:40800950: we deleted 1 objects and enqueued 0 objects
8918.950510623:40800950: delete batch from store, new sizes: log 0, phys 0
8918.950529480:40800950: regular consumer finished, iret=4, szlog 0 sz phys 0
8918.950548337:40800950: main Q:Reg/w0: worker IDLE, waiting for work.
8918.952900510:42003950: imudp: epoll_wait() returned with 1 fds
8918.952922789:42003950: CmpHost returns -1
8918.953191746:42003950: recv(4,15)/serverX,acl:1,msg:<198>-- MARK --pd[11536]: connect from XXXXX.XXXXX.XXXXX[123.456.789.0]0.154]
8918.953218146:42003950: main Q: entry added, size now log 1, phys 1 entries
8918.953241613:42003950: YYY: wtpAdviseMaxWorkers, sufficient workers, just doing adivse signal cond busy
8918.953264171:42003950: main Q: EnqueueMsg advised worker start
8918.953291339:40800950: wti 0x663860: worker awoke from idle processing
8918.953316691:40800950: we deleted 0 objects and enqueued 0 objects
8918.953335199:40800950: delete batch from store, new sizes: log 1, phys 1
8918.953354196:40800950: msgConsumer processes msg 0/1
8918.953373263:40800950: msg parser: flags 30, from 'serverX', msg '<198>-- MARK --'
8918.953392190:40800950: parse using parser list 0x659a50 (the default list).
8918.953425643:40800950: Parser 'rsyslog.rfc5424' returned -2160
8918.953444570:40800950: Message will now be parsed by the legacy syslog parser (one size fits all... ;)).
8918.953464335:40800950: Parser 'rsyslog.rfc3164' returned 0
8918.953486684:40800950: Called action, logging to ommysql
8918.953507846:40800950: Action 0x676c40 transitioned to state: itx
8918.953526773:40800950: entering actionCalldoAction(), state: itx
8918.953550170:40800950:
8918.956735195:40800950: Called LogError, msg: db error (1054): Unknown column 'invld' in 'field list'

8918.956865727:40800950: main Q: entry added, size now log 1, phys 2 entries
8918.956891987:40800950: YYY: wtpAdviseMaxWorkers, sufficient workers, just doing adivse signal cond busy
8918.956910984:40800950: main Q: EnqueueMsg advised worker start
8918.956984596:40800950: Action 0x676c40 transitioned to state: rtry
8918.957004641:40800950: action call returned -2007
8918.957056882:40800950: tryDoAction: unexpected error code -2007, finalizing
8918.959736259:40800950: Action 0x676c40 transitioned to state: rdy
8918.959757839:40800950: Action 0x676c40 transitioned to state: itx
8918.959775509:40800950: entering actionCalldoAction(), state: itx
8918.959800792:40800950:
8918.962683615:40800950: mysql, DBError(silent): db error (1054): Unknown column 'invld' in 'field list'

8918.962748637:40800950: Action 0x676c40 transitioned to state: rtry
8918.962767913:40800950: action call returned -2007
8918.962785513:40800950: tryDoAction: unexpected error code -2007, finalizing
8918.965446801:40800950: Action 0x676c40 transitioned to state: rdy
8918.965467264:40800950: Action 0x676c40 transitioned to state: itx
8918.965484724:40800950: entering actionCalldoAction(), state: itx
8918.965506096:40800950:
8918.968357700:40800950: mysql, DBError(silent): db error (1054): Unknown column 'invld' in 'field list'

[...]

Any hints what might be wrong? Do you need more information? I'm thankful for any helpful hints!

Best regards,
Jens
basquiat
Avarage
 
Posts: 12
Joined: Tue Oct 27, 2009 5:03 pm

Urgent Question?

  • Pulling out your Hair?
  • Wasting Time and Money?
  • Deadline Approaching?

Re: 100% CPU utilization

Postby rgerhards » Wed Feb 17, 2010 4:18 pm

This looks like pointing to the root cause:

Code: Select all
 mysql, DBError(silent): db error (1054): Unknown column 'invld' in 'field list'


Any issues with the SQL statement?

And, of course, that should not yield to a loop, but the ommysql is not very good at conveying back status information. The new rule engine than retries, but to no avail.
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby basquiat » Thu Feb 18, 2010 3:34 pm

I put some debugging code into ommysql.c and got the query leading to the failure:

Code: Select all
4241.909122250:40800950: Action 0x67af60 transitioned to state: rtry
4241.909142574:40800950: action call returned -2007
4241.909162618:40800950: tryDoAction: unexpected error code -2007, finalizing
4241.911787727:40800950: Action 0x67af60 transitioned to state: rdy
4241.911811473:40800950: Action 0x67af60 transitioned to state: itx
4241.911831727:40800950: entering actionCalldoAction(), state: itx
4241.911854774:40800950:
4241.914686124:40800950: mysql, DBError(silent): db error (1054): Unknown column 'invld' in 'field list'

4241.914749679:40800950: ****** MYSQL: insert into SystemEvents (Message, Facility, FromHost, Priority, DeviceReportedTime, ReceivedAt, InfoUnitID, SysLogTag, ProcessID) values (' MARK --', invld, 'serverX', 6, '20100218154159', '20100218154159', 1, '--', '-')


As far as I see things, "invld" is not really a good match for a smallint in MySQL? ;-) Do you have any suggestions on this, anything obvious?

Thank you for all your work and support!

BR,
Jens
basquiat
Avarage
 
Posts: 12
Joined: Tue Oct 27, 2009 5:03 pm

Re: 100% CPU utilization

Postby rgerhards » Thu Feb 18, 2010 4:30 pm

I just realize that the message that induces this problem seems to originate in immark. Looking closer, I guess I do not set the facility/priority correctly, what then triggers the problem. Will investigate. Thanks for the detailed info!
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby rgerhards » Thu Feb 18, 2010 5:21 pm

no, that's not the case. Looks like being remote induced. Can you mail me a complete debug log to rgerhards@gmail.com? Please let me know when you have done so, I do not check this account regularly.
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby basquiat » Fri Feb 19, 2010 10:44 am

Can you elaborate what information you'll need, i.e. with which debugging arguments / environment variables I should call rsyslog to gather something valuable for you? As I'm required to obfuscate/anonymize the logfiles, less is better. ;-)

The log entries in question originate from a Debian Lenny 32bit box running syslogd 1.5.0 and are transmitted via UDP to the rsyslog server, looking as follows (ok, this won't be of much help):

Code: Select all
Feb 18 16:01:59 serverX -- MARK --

I temporarily disabled the MARKin feature on "serverX", which "solves" the problem so far, but I'll be happy if I can help you in finding something which may make rsyslog even more solid! ;-)

Best regards,
Jens
basquiat
Avarage
 
Posts: 12
Joined: Tue Oct 27, 2009 5:03 pm

Re: 100% CPU utilization

Postby rgerhards » Fri Feb 19, 2010 10:48 am

ok, that's a good starting point. Maybe we can avoid looking at debug logs. Could you look how precisely this message looks on the wire? (I guess some part at the front is missing, but I may be wrong). At least two ways to do that:

- use Wireshark
- use rsyslog debug log, look for the line that contains the reception (I guess it should be stated)

Wireshark is probably the easiest choice. I only need the complete line, nothing else. I can than feed that into my lab environment and then see what happens. Obtaining that precise info was the main point in asking for the debug log ;)
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby rgerhards » Fri Feb 19, 2010 2:38 pm

I added a testcase myself, both with the message as quoted and with the string I guess to be in front of it. Both of these test cases did work. So I actually need more information (testing done with the recent master branch).
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby basquiat » Fri Feb 19, 2010 2:51 pm

I re-enabled MARK support on serverX's syslogd. Please have a look at the local syslogd output, the tcpdump capture and the corresponding debug log (the latter two taken from the rsyslog server):

syslogd 1.5.0 output, locally:

Code: Select all
Feb 19 14:53:37 serverX -- MARK --

"tcpdump -A udp and host serverX.local -w capture.log" followed by "tcpdump -A -r capture.log" (on the rsyslog server):

Code: Select all
reading from file capture.log, link-type EN10MB (Ethernet)
14:53:37.057551 IP serverX.local.syslog > rsyslogserverX.local.syslog: SYSLOG unknown (24).info, length: 15
E..+..@.>.................}U<198>-- MARK --...

rsyslog 5.3.7 debug info:

Code: Select all
[...]
7617.044478883:40800950: Action 0x67b690 transitioned to state: rdy
7617.044498787:40800950: action call returned 0
7617.044518832:40800950: ruleset: get iRet 0 from rule.ProcessMsg()
7617.044538527:40800950: ruleset: get iRet 0 from rule.ProcessMsg()
7617.044558082:40800950: ruleset: get iRet 0 from rule.ProcessMsg()
7617.044577568:40800950: ruleset.ProcessMsg() returns 0
7617.044597542:40800950: regular consumer finished, iret=0, szlog 0 sz phys 1
7617.044617726:40800950: we deleted 1 objects and enqueued 0 objects
7617.044637212:40800950: delete batch from store, new sizes: log 0, phys 0
7617.044657187:40800950: regular consumer finished, iret=4, szlog 0 sz phys 0
7617.044676742:40800950: main Q:Reg/w0: worker IDLE, waiting for work.
7617.057612183:42003950: imudp: epoll_wait() returned with 1 fds
7617.057640120:42003950: CmpHost returns -1
7617.057957686:42003950: recv(4,15)/serverX,acl:1,msg:<198>-- MARK --3:37 serverY postfix/cleanup[4907]: E919223890: message-id=<20100
7617.057986740:42003950: main Q: entry added, size now log 1, phys 1 entries
7617.058012651:42003950: YYY: wtpAdviseMaxWorkers, sufficient workers, just doing adivse signal cond busy
7617.058037934:42003950: main Q: EnqueueMsg advised worker start
7617.058071387:40800950: wti 0x663860: worker awoke from idle processing
7617.058101279:40800950: we deleted 0 objects and enqueued 0 objects
7617.058120974:40800950: delete batch from store, new sizes: log 1, phys 1
7617.058140600:40800950: msgConsumer processes msg 0/1
7617.058160854:40800950: msg parser: flags 30, from 'serverX', msg '<198>-- MARK --'
7617.058180898:40800950: parse using parser list 0x659a50 (the default list).
7617.058201012:40800950: Parser 'rsyslog.rfc5424' returned -2160
7617.058220358:40800950: Message will now be parsed by the legacy syslog parser (one size fits all... ;)).
7617.058240542:40800950: Parser 'rsyslog.rfc3164' returned 0
7617.058263589:40800950: Called action, logging to ommysql
7617.058285520:40800950: Action 0x67af60 transitioned to state: itx
7617.058304935:40800950: entering actionCalldoAction(), state: itx
7617.058328262:40800950:
7617.061619165:40800950: Called LogError, msg: db error (1054): Unknown column 'invld' in 'field list'

7617.061703603:40800950: main Q: entry added, size now log 1, phys 2 entries
7617.061731330:40800950: YYY: wtpAdviseMaxWorkers, sufficient workers, just doing adivse signal cond busy
7617.061750466:40800950: main Q: EnqueueMsg advised worker start
7617.061792720:40800950: ****** MYSQL: insert into SystemEvents (Message, Facility, FromHost, Priority, DeviceReportedTime, ReceivedAt, InfoUnitID, SysLogTag, ProcessID) values (' MARK --', invld, 'serverX', 6, '20100219145337', '20100219145337', 1, '--', '-')
7617.061813462:40800950: Action 0x67af60 transitioned to state: rtry
7617.061832529:40800950: action call returned -2007
7617.061851735:40800950: tryDoAction: unexpected error code -2007, finalizing
7617.064629168:40800950: Action 0x67af60 transitioned to state: rdy
7617.064651098:40800950: Action 0x67af60 transitioned to state: itx
7617.064670095:40800950: entering actionCalldoAction(), state: itx
7617.064691815:40800950:
7617.067625971:40800950: mysql, DBError(silent): db error (1054): Unknown column 'invld' in 'field list'

7617.067661729:40800950: ****** MYSQL: insert into SystemEvents (Message, Facility, FromHost, Priority, DeviceReportedTime, ReceivedAt, InfoUnitID, SysLogTag, ProcessID) values (' MARK --', invld, 'serverX', 6, '20100219145337', '20100219145337', 1, '--', '-')
7617.067682751:40800950: Action 0x67af60 transitioned to state: rtry
7617.067702097:40800950: action call returned -2007
7617.067721443:40800950: tryDoAction: unexpected error code -2007, finalizing
7617.070490774:40800950: Action 0x67af60 transitioned to state: rdy
7617.070512774:40800950: Action 0x67af60 transitioned to state: itx
7617.070532050:40800950: entering actionCalldoAction(), state: itx
7617.070553701:40800950:
7617.073509019:40800950: mysql, DBError(silent): db error (1054): Unknown column 'invld' in 'field list'
[...]

I hope this is of any use for you - if you need anything else, I'll still be glad to help out!

Best regards,
Jens
basquiat
Avarage
 
Posts: 12
Joined: Tue Oct 27, 2009 5:03 pm

Re: 100% CPU utilization

Postby rgerhards » Fri Feb 19, 2010 2:52 pm

umpf... that on the wire format is *really* weired :(
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby rgerhards » Fri Feb 19, 2010 2:59 pm

ok, I can trigger the "invld" priority in my lab. Now I need to see how this comes ;)
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby basquiat » Fri Feb 19, 2010 3:00 pm

Thanks for all your effort! So you mean I should file a "weirdiness" bug against Lenny's syslogd? ;-)
basquiat
Avarage
 
Posts: 12
Joined: Tue Oct 27, 2009 5:03 pm

Re: 100% CPU utilization

Postby rgerhards » Fri Feb 19, 2010 3:02 pm

While it would be the right place to complain, it's probably a fruitless effort. It's doing this for many years, and I doubt anyone will make the effort to fix it. I need to fix the rsyslg "invld" bug in any case, but you will probably end up with rather unexpected field values.

So: your decision ;)
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby rgerhards » Fri Feb 19, 2010 3:26 pm

ummm... it's even worse than I thought: rsyslog is completely right! the facility is invalid, 198 is NOT a valid PRI value (it has facility 24, which does not exist (see the old definition in http://www.ietf.org/rfc/rfc3164.txt, as well as in current RFC5424). I have to admit I do not know how to proceed from here on. Probably filing a bug against sysklogd is the only right thing to do...
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Re: 100% CPU utilization

Postby rgerhards » Fri Feb 19, 2010 3:38 pm

get's even worse: I checked sysklogd code, and it intentionally has a code LOG_MARK, with that non-existing facility of 24. Even more puzzled now...
rgerhards
Site Admin
 
Posts: 3607
Joined: Thu Feb 13, 2003 11:57 am

Google Ads



Return to General

Who is online

Users browsing this forum: No registered users and 0 guests

cron