rsyslog doesn't write but accepts connections

General discussions here

Moderator: rgerhards

Google Ads


rsyslog doesn't write but accepts connections

Postby Grxf8 » Sat Oct 14, 2017 12:37 am

Hi all,

My rsyslog server is running on SLES12SP2 on the following version:

Code: Select all
rsyslog-module-relp-8.4.0-16.2.x86_64
rsyslog-8.4.0-16.2.x86_64
rsyslog-module-gtls-8.4.0-16.2.x86_64

rsyslogd 8.4.0, compiled with:
        FEATURE_REGEXP:                         Yes
        GSSAPI Kerberos 5 support:              Yes
        FEATURE_DEBUG (debug build, slow code): No
        32bit Atomic operations supported:      Yes
        64bit Atomic operations supported:      Yes
        memory allocator:                       system default
        Runtime Instrumentation (slow code):    No
        uuid support:                           Yes
        Number of Bits in RainerScript integers: 64

See http://www.rsyslog.com for more information.

I cannot upgrade it.

The configuration is the following:
Code: Select all
######### Global Configuration ##########

global(
        maxMessageSize="64k"
        janitor.interval="1"
        action.reportSuspensionContinuation="on"
        defaultNetstreamDriverCAFile="/etc/ccname.crt"
        defaultNetstreamDriverKeyFile="/etc/syslog.key"
        defaultNetstreamDriverCertFile="/etc/syslog.crt"
)

main_queue(

        queue.filename="rsyslogd_tcpfwd_q"
        queue.spoolDirectory="/var/spool/rsyslog"
        queue.size="3500000"
        queue.maxdiskspace="500m"
        queue.maxfilesize="250m"
        queue.discardMark="3000000"
)

# Activate impstats for runtime analysis
module(load="impstats" interval="600" )

########## Inputs Configuration ##########

# TLS Syslog Server:
module(load="imtcp" streamDriver.name="gtls" streamDriver.authMode="anon" streamDriver.Mode="1"  MaxSessions="1000" )
input(type="imtcp" port="6515" ruleset="r_dynamicFileNameFromHost")

# PLAIN TCP Syslog Server:
module(load="imptcp" )       # load module
input(type="imptcp" port="6514" ruleset="r_dynamicFileNameFromHost")
input(type="imptcp" port="1514" ruleset="r_dynamicFileNameFromHost")

# UDP Syslog Server:
module(load="imudp" )       # load module
input(type="imudp" port="514" ruleset="r_dynamicFileNameFromHost")

# RELP Syslog Server:
module(load="imrelp")
input(type="imrelp" port="6516" ruleset="r_dynamicFileNameFromHost")

######### Templates ##########

template (name="t_dynamicFileNoForward" type="string" string="/var/log/remote/%FROMHOST%/%FROMHOST%.noforward")
template (name="t_dynamicFileNameFromHost" type="string" string="/var/log/remote/%FROMHOST%/%FROMHOST%.log")
template (name="t_dynamicFileNameTag" type="string" string="/var/log/remote/%FROMHOST%/%FROMHOST%-%SYSLOGTAG%.log")
template (name="t_dynamicFileProgramName" type="string" string="/var/log/remote/%FROMHOST%/%FROMHOST%-%PROGRAMNAME%.log")
template (name="t_ccnameAppFormat" type="string" string="%msg%\n")

######### Rulesets ##########

ruleset(name="r_dynamicFileNameFromHost"){

        #
        # Filter rules for app2name forwarding
        #
        if      ($msg contains "ASA-1" or $msg contains "ASA-2" or ($hostname contains "bla" and $msg contains "TRAFFIC"))
        then    {

                     action(name="act_forwardapp2name" type="omfwd" target="app2name.ccname.net" port="514" protocol="udp")
        }

      if      ($syslogtag contains '-apache')
        then    {
                action(name="act_writeApache" type="omfile" asyncWriting="on" flushOnTXEnd="on" closeTimeout="5" dynaFile="t_dynamicFileProgramName" fileGroup="group_ro" dirGroup="group_ro" dirCreateMode="0550" fileCreateMode="0640")
                stop
        }


        if      ($syslogtag contains 'ccname-app')
        then    {
                action(name="act_writeccnameApp" type="omfile" asyncWriting="on" flushOnTXEnd="on" closeTimeout="5" dynaFile="t_dynamicFileProgramName" template="t_ccnameAppFormat" fileGroup="group_ro" dirGroup="group_ro" dirCreateMode="0550" fileCreateMode="0640")
                stop
        }


        if      ($hostname contains 'myhostname')
        then    {
                action(name="act_writeMYHOSTNAME" type="omfile" asyncWriting="on" flushOnTXEnd="on" closeTimeout="5" dynaFile="t_dynamicFileNameFromHost"  fileGroup="group_ro" dirGroup="group_ro" dirCreateMode="0550" fileCreateMode="0640")
                stop
        }


        if      ($syslogtag contains 'audispd')
        then    {
                action(name="act_writeNoforward" type="omfile" asyncWriting="on" flushOnTXEnd="on" closeTimeout="5" dynaFile="t_dynamicFileNoForward" fileGroup="group_ro" dirGroup="group_ro" dirCreateMode="0550" fileCreateMode="0640")
                stop
        }

        action(name="act_writeDefault" type="omfile" asyncWriting="on" flushOnTXEnd="on" closeTimeout="5" dynaFile="t_dynamicFileNameFromHost" fileGroup="group_ro" dirGroup="group_ro" dirCreateMode="0550" fileCreateMode="0640")

}


Sporadically and without error messages, the rsyslogd daemon stops writing files to disk and still accepts incoming TCP connections. This causes the client to believe the service is still up and running when it's actually not writing anything to disk.
If I HUP the rsyslogd daemon, nothing changes, if I restart it instead it hangs and then eventually systemd kills it with a -9 and restart it later.

error message after restart:

Code: Select all
2017-10-14T09:47:55.432967+11:00 syslog1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.0" x-pid
="42445" x-info="http://www.rsyslog.com"] start
2017-10-14T09:47:55.432639+11:00 syslog1 rsyslogd-2357: queue "action 24 queue": queue.discardMark is
set quite low at 980. You should only set it below 80% (1600000) if you have a good reason for this. [
try http://www.rsyslog.com/e/2357 ]
2017-10-14T09:47:55.013577+11:00 syslog1 systemd[1]: rsyslog.service: State 'stop-sigterm' timed out.
Killing.
2017-10-14T09:47:55.141873+11:00 syslog1 systemd[1]: rsyslog.service: Main process exited, code=killed
, status=9/KILL


After restart, everything works fine.

What I already tried to no avail:
    Strace after broken
    Check n of open files
    Increasing systemd limits and limits for cgroups
    Creating more rsyslog servers to spread the load
    activate impstats

The last impstats output was the following:
Code: Select all
2017-10-14T09:13:11.329189+11:00 syslog1 rsyslogd-pstats: imuxsock: submitted=3243889 ratelimit.discarded=0 ratelimit.numratelimiters=0
2017-10-14T09:13:11.329198+11:00 syslog1 rsyslogd-pstats: act_forwardapp2name: processed=3371276 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329201+11:00 syslog1 rsyslogd-pstats: dynafile cache t_dynamicFileProgramName: requests=1416761 level0=713527 missed=876 evicted=13 maxused=10 closetimeouts=801
2017-10-14T09:13:11.329203+11:00 syslog1 rsyslogd-pstats: act_writeApache: processed=1416761 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329206+11:00 syslog1 rsyslogd-pstats: dynafile cache t_dynamicFileProgramName: requests=1836733 level0=1359179 missed=123945 evicted=123835 maxused=10 closetimeouts=0
2017-10-14T09:13:11.329208+11:00 syslog1 rsyslogd-pstats: act_writeccnameApp: processed=1836733 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329210+11:00 syslog1 rsyslogd-pstats: dynafile cache t_dynamicFileNameFromHost: requests=0 level0=0 missed=0 evicted=0 maxused=0 closetimeouts=0
2017-10-14T09:13:11.329211+11:00 syslog1 rsyslogd-pstats: act_writeMYHOSTNAME: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329213+11:00 syslog1 rsyslogd-pstats: dynafile cache t_dynamicFileNoForward: requests=479087872 level0=444709731 missed=2316678 evicted=2316568 maxused=10 closetimeouts=0
2017-10-14T09:13:11.329215+11:00 syslog1 rsyslogd-pstats: act_writeNoforward: processed=479087872 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329217+11:00 syslog1 rsyslogd-pstats: dynafile cache t_dynamicFileNameFromHost: requests=187246546 level0=158693759 missed=1532104 evicted=1531994 maxused=10 closetimeouts=0
2017-10-14T09:13:11.329219+11:00 syslog1 rsyslogd-pstats: act_writeDefault: processed=187246546 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329221+11:00 syslog1 rsyslogd-pstats: action 7: processed=776 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329223+11:00 syslog1 rsyslogd-pstats: action 8: processed=776 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329224+11:00 syslog1 rsyslogd-pstats: action 9: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329226+11:00 syslog1 rsyslogd-pstats: action 10: processed=12 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329228+11:00 syslog1 rsyslogd-pstats: action 11: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329229+11:00 syslog1 rsyslogd-pstats: action 12: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329231+11:00 syslog1 rsyslogd-pstats: action 13: processed=3011602 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329232+11:00 syslog1 rsyslogd-pstats: action 14: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329234+11:00 syslog1 rsyslogd-pstats: action 15: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329236+11:00 syslog1 rsyslogd-pstats: action 16: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329237+11:00 syslog1 rsyslogd-pstats: action 17: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329239+11:00 syslog1 rsyslogd-pstats: action 18: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329241+11:00 syslog1 rsyslogd-pstats: action 19: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329242+11:00 syslog1 rsyslogd-pstats: action 20: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329244+11:00 syslog1 rsyslogd-pstats: action 21: processed=762 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329246+11:00 syslog1 rsyslogd-pstats: action 22: processed=1 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329247+11:00 syslog1 rsyslogd-pstats: action 23: processed=267713 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329249+11:00 syslog1 rsyslogd-pstats: action 24: processed=267713 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329252+11:00 syslog1 rsyslogd-pstats: action 25: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329253+11:00 syslog1 rsyslogd-pstats: action 26: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329254+11:00 syslog1 rsyslogd-pstats: action 27: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329256+11:00 syslog1 rsyslogd-pstats: action 28: processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
2017-10-14T09:13:11.329257+11:00 syslog1 rsyslogd-pstats: imtcp(6515): submitted=254750
2017-10-14T09:13:11.329259+11:00 syslog1 rsyslogd-pstats: imptcp(*/1514/IPv4): submitted=8000979 bytes.received=1719586373 bytes.decompressed=0
2017-10-14T09:13:11.329261+11:00 syslog1 rsyslogd-pstats: imptcp(*/1514/IPv6): submitted=0 bytes.received=0 bytes.decompressed=0
2017-10-14T09:13:11.329262+11:00 syslog1 rsyslogd-pstats: imptcp(*/6514/IPv4): submitted=650559306 bytes.received=138735126314 bytes.decompressed=0
2017-10-14T09:13:11.329264+11:00 syslog1 rsyslogd-pstats: imptcp(*/6514/IPv6): submitted=0 bytes.received=0 bytes.decompressed=0
2017-10-14T09:13:11.329265+11:00 syslog1 rsyslogd-pstats: imudp(*:514): submitted=10772877
2017-10-14T09:13:11.329267+11:00 syslog1 rsyslogd-pstats: imudp(*:514): submitted=0
2017-10-14T09:13:11.329268+11:00 syslog1 rsyslogd-pstats: imrelp[6516]: submitted=0
2017-10-14T09:13:11.329271+11:00 syslog1 rsyslogd-pstats: resource-usage: utime=6107983334 stime=7456966357 maxrss=1264156 minflt=1582519 majflt=293 inblock=23408 oublock=295146880 nvcsw=673584134 nivcsw=158781579
2017-10-14T09:13:11.329273+11:00 syslog1 rsyslogd-pstats: action 24 queue[DA]: size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0
2017-10-14T09:13:11.329275+11:00 syslog1 rsyslogd-pstats: action 24 queue: size=0 enqueued=267713 full=0 discarded.full=0 discarded.nf=0 maxqsize=48
2017-10-14T09:13:11.329281+11:00 syslog1 rsyslogd-pstats: main Q[DA]: size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0
2017-10-14T09:13:11.329283+11:00 syslog1 rsyslogd-pstats: main Q: size=46 enqueued=672867285 full=0 discarded.full=0 discarded.nf=0 maxqsize=1534124
2017-10-14T09:13:11.329285+11:00 syslog1 rsyslogd-pstats: imudp(w0): called.recvmmsg=20999734 called.recvmsg=0 msgs.received=10772879


I have to probably tune some parameter to fix this behaviour, but I'm not sure what exactly. This happens on servers with identical configuration and 1/10 of the load, so maybe is not a load related problem.

Any help would be greatly appreciated.
Grxf8
New
 
Posts: 3
Joined: Fri Oct 13, 2017 11:51 pm

Urgent Question?

  • Pulling out your Hair?
  • Wasting Time and Money?
  • Deadline Approaching?

Re: rsyslog doesn't write but accepts connections

Postby dlang » Mon Oct 16, 2017 7:20 am

your error messages don't match the config file you present. There is no unnamed 'action 24' that has a queue on it.

do you have other rulesets, or included configs?
dlang
Frequent Poster
 
Posts: 1001
Joined: Mon Sep 15, 2008 7:44 am

Re: rsyslog doesn't write but accepts connections

Postby Grxf8 » Wed Oct 18, 2017 1:26 am

dlang wrote:your error messages don't match the config file you present. There is no unnamed 'action 24' that has a queue on it.

do you have other rulesets, or included configs?


Sorry, I missed the main configuration file actually. See below

Code: Select all
#
# if you experience problems, check
# http://www.rsyslog.com/troubleshoot for assistance
# and report them at http://bugzilla.novell.com/
#

# since rsyslog v3: load input modules
# If you do not load inputs, nothing happens!

# Where to put state file for imfile
$WorkDirectory /var/spool/rsyslog

# provides --MARK-- message capability (every 1 hour)
$ModLoad immark.so
$MarkMessagePeriod      3600

# provides support for local system logging (e.g. via logger command)
$ModLoad imuxsock.so

# reduce dupplicate log messages (last message repeated n times)
$RepeatedMsgReduction   off

# kernel logging (may be also provided by /sbin/klogd)
# see also http://www.rsyslog.com/doc-imklog.html.
$ModLoad imklog.so
# set log level 1 (same as in /etc/sysconfig/syslog).
$klogConsoleLogLevel    1


#
# Include config generated by /etc/init.d/syslog script
# using the SYSLOGD_ADDITIONAL_SOCKET* variables in the
# /etc/sysconfig/syslog file.
#
$IncludeConfig /run/rsyslog/additional-log-sockets.conf

#
# Include config files, that the admin provided? :
#
$IncludeConfig /etc/rsyslog.d/*.conf


###
# print most important on tty10 and on the xconsole pipe
#
if      ( \
            /* kernel up to warning except of firewall  */ \
            ($syslogfacility-text == 'kern')      and      \
            ($syslogseverity <= 4 /* warning */ ) and not  \
            ($msg contains 'IN=' and $msg contains 'OUT=') \
        ) or ( \
            /* up to errors except of facility authpriv */ \
            ($syslogseverity <= 3 /* errors  */ ) and not  \
            ($syslogfacility-text == 'authpriv')           \
        ) \
then {
        /dev/tty10
        |/dev/xconsole
}

# Emergency messages to everyone logged on (wall)
*.emerg                                  :omusrmsg:*


if      ($syslogfacility-text == 'kern') and \
        ($msg contains 'IN=' and $msg contains 'OUT=') \
then {
        -/var/log/firewall
        stop
}


if      ($programname == 'acpid' or $syslogtag == '[acpid]:') and \
        ($syslogseverity <= 5 /* notice */) \
then {
        -/var/log/acpid
        stop
}


#
# NetworkManager into separate file and stop their further processing
#
if      ($programname == 'NetworkManager') or \
        ($programname startswith 'nm-') \
then {
        -/var/log/NetworkManager
        stop
}

#
# Separate auditsp logs
#

if      ($programname startswith 'audisp') or \
        ($programname contains 'audisipd') or \
        ($syslogtag == '[audispd]:') \
then {
        @@syslog:6514
        stop
}

#
# email-messages
#
mail.*                                  -/var/log/mail
mail.info                               -/var/log/mail.info
mail.warning                            -/var/log/mail.warn
mail.err                                 /var/log/mail.err


#
# news-messages
#
news.crit                               -/var/log/news/news.crit
news.err                                -/var/log/news/news.err
news.notice                             -/var/log/news/news.notice
# enable this, if you want to keep all news messages
# in one file
#news.*                                 -/var/log/news.all


#
# Warnings in one file
#
*.=warning;*.=err                       -/var/log/warn
*.crit                                   /var/log/warn


#
# the rest in one file
#
*.*;mail.none;news.none;local1.none     -/var/log/messages


$WorkDirectory /var/spool/rsyslog
$ActionQueueType LinkedList
$ActionQueueFileName rsyslogd_tcpfwd_q
$ActionQueueMaxDiskSpace 500m
$ActionQueueSaveOnShutdown on
$ActionQueueMaxFileSize 250m
$ActionQueueSize 2000000
$ActionResumeRetryCount -1

#
# Send messages to syslog
#

*.*                                     @@syslog:6514


#
# enable this, if you want to keep all messages
# in one file
#*.*                                    -/var/log/allmessages


#
# Some foreign boot scripts require local7
#
local0.*                                -/var/log/localmessages
local2.*;local3.*                       -/var/log/localmessages
local4.*;local5.*                       -/var/log/localmessages
local6.*;local7.*                       -/var/log/localmessages
###
Grxf8
New
 
Posts: 3
Joined: Fri Oct 13, 2017 11:51 pm

Re: rsyslog doesn't write but accepts connections

Postby Grxf8 » Tue Oct 31, 2017 5:24 am

I you set the following:
Code: Select all
MaxMessageSize 60k
asyncWriting="on" flushOnTXEnd="on"

and send a 60k or bigger message, the output action writes a truncated version of the message and then it completely stops writing.
The input however still works, making the clients believe that the rsyslog server is still alive and well as it is accepting connections.

As soon as
Code: Select all
asyncWriting="on" flushOnTXEnd="on"
is removed, everything works just fine.

Could it be a bug on the asyncwrite implementation?
Grxf8
New
 
Posts: 3
Joined: Fri Oct 13, 2017 11:51 pm

Google Ads



Return to General

Who is online

Users browsing this forum: No registered users and 1 guest

cron