RELP messages stuck for long periods

This is the place for you, if you got rsyslog up and running but wonder how to make it do what you want.

Moderator: rgerhards

Google Ads


RELP messages stuck for long periods

Postby cvc69091 » Wed Dec 07, 2016 12:36 am

Greetings, seeking some pointers for the following scenario; hoping that someone can point me in the right direction to investigate our issue further.

We have several servers that forward to a master node with RELP; each server has about six rsyslog config files for the critical messages we want to forward. All of these RELP configs are the same and simple - target IP, port, etc. Three of these config files work as expected and forward regularly in intervals. The other three, which process more spontaneous messages throughout the business day, seem to get messages stuck in memory sometimes and will forward the messages to the master node several days later which is not ideal. Each server only sends < 1200 messages per day to the master node, so rsyslog is not under heavy load. These messages are extremely time sensitive and important for reporting, hence the desire to continue using RELP for reliable transport and we don't want to be alerted of a message that actually happened yesterday but hadn't been forwarded on time.

Question: what variables should we consider to implement in the config files for a more real-time solution, as to not let messages get caught in memory? Perhaps adjusting the RELP windowSize, conn.timeout? Or should we be looking towards a setting related to the in-memory queue so that messages are discarded if they get stuck, as not to send them days later? Any advice in the right direction is appreciated!
cvc69091
New
 
Posts: 2
Joined: Tue Dec 06, 2016 10:20 pm

Urgent Question?

  • Pulling out your Hair?
  • Wasting Time and Money?
  • Deadline Approaching?

Re: RELP messages stuck for long periods

Postby dlang » Wed Dec 07, 2016 12:59 am

This should not be happening, and is probably a bug

could you configure impstats on the systems that have problems and have it write the stats to a file. We would be looking to see if there are any suspended counts on the systems that have trouble

Also, what version are you running?
dlang
Frequent Poster
 
Posts: 1001
Joined: Mon Sep 15, 2008 7:44 am

Re: RELP messages stuck for long periods

Postby cvc69091 » Mon Dec 12, 2016 6:51 pm

Sorry for the delayed response, I wanted to ensure that I had significant data collected. We are utilizing rsyslog 8.4.2-1.el7 on the servers. I've cropped out one of the known actions below that is getting "stuck" before sending and correlated the original and arrival timestamps:
    impstats---------------------------------------------------------------------------------------------------------------------------------------------------------------Original message------Actual RELP arrival timestamp
    2016-12-09T07:53:13.023479+00:00 server1 rsyslogd-pstats: action-v: processed=1 failed=0 suspended=0 suspended.duration=0 resumed=0--------Dec 9 07:52:50---------2016-12-12T13:32:30.904Z
    2016-12-09T08:22:14.465087+00:00 server1 rsyslogd-pstats: action-v: processed=2 failed=0 suspended=0 suspended.duration=0 resumed=0--------Dec 9 08;21:37---------2016-12-12T13:32:30.904Z
    2016-12-09T09:39:18.188447+00:00 server1 rsyslogd-pstats: action-v: processed=4 failed=0 suspended=0 suspended.duration=0 resumed=0--------Dec 9 09:38:09---------2016-12-12T13:32:30.904Z
    2016-12-09T09:39:18.188447+00:00 server1 rsyslogd-pstats: action-v: processed=4 failed=0 suspended=0 suspended.duration=0 resumed=0--------Dec 9 09:38:41---------2016-12-12T13:32:30.904Z
    2016-12-09T09:56:19.083068+00:00 server1 rsyslogd-pstats: action-v: processed=5 failed=0 suspended=0 suspended.duration=0 resumed=0--------Dec 9 14:08:30---------2016-12-12T13:32:30.904Z
    2016-12-09T14:09:29.304460+00:00 server1 rsyslogd-pstats: action-v: processed=7 failed=0 suspended=0 suspended.duration=0 resumed=0--------Dec 9 14:09:23---------2016-12-12T13:32:30.904Z
    2016-12-09T14:18:29.574144+00:00 server1 rsyslogd-pstats: action-v: processed=8 failed=0 suspended=0 suspended.duration=0 resumed=0--------Dec 9 14:17:49---------2016-12-12T13:32:30.904Z
    2016-12-09T15:02:31.229877+00:00 server1 rsyslogd-pstats: action-v: processed=9 failed=0 suspended=0 suspended.duration=0 resumed=0-------------------------------------------------------------------
    2016-12-09T16:51:35.998942+00:00 server1 rsyslogd-pstats: action-v: processed=10 failed=0 suspended=0 suspended.duration=0 resumed=0-------Dec 9 16:50:56---------2016-12-12T13:32:30.904Z
    2016-12-09T17:30:37.802581+00:00 server1 rsyslogd-pstats: action-v: processed=11 failed=0 suspended=0 suspended.duration=0 resumed=0------------------------------------------------------------------
    2016-12-09T19:26:43.189183+00:00 server1 rsyslogd-pstats: action-v: processed=12 failed=0 suspended=0 suspended.duration=0 resumed=0-------Dec 9 19:26:02---------2016-12-12T13:32:30.904Z
    2016-12-09T21:18:48.269935+00:00 server1 rsyslogd-pstats: action-v: processed=13 failed=0 suspended=0 suspended.duration=0 resumed=0------------------------------------------------------------------
    2016-12-10T00:42:56.898693+00:00 server1 rsyslogd-pstats: action-v: processed=14 failed=0 suspended=0 suspended.duration=0 resumed=0-------Dec 10 00:42:48--------2016-12-12T13:32:30.904Z
    2016-12-11T00:31:58.054968+00:00 server1 rsyslogd-pstats: action-v: processed=15 failed=0 suspended=0 suspended.duration=0 resumed=0------------------------------------------------------------------
    2016-12-11T00:32:58.097473+00:00 server1 rsyslogd-pstats: action-v: processed=16 failed=0 suspended=0 suspended.duration=0 resumed=0-------Dec 11 00:32:19--------2016-12-12T13:32:30.995Z
    2016-12-12T03:47:08.014436+00:00 server1 rsyslogd-pstats: action-v: processed=17 failed=0 suspended=0 suspended.duration=0 resumed=0-------Dec 12 03:46:44--------2016-12-12T13:32:30.995Z
    2016-12-12T13:33:31.426504+00:00 server1 rsyslogd-pstats: action-v: processed=19 failed=0 suspended=0 suspended.duration=0 resumed=0-------Dec 12 13:32:34--------2016-12-12T13:32:34.123Z
    2016-12-12T15:05:34.969433+00:00 server1 rsyslogd-pstats: action-v: processed=20 failed=0 suspended=0 suspended.duration=0 resumed=0------------------------------------------------------------------
    2016-12-12T15:06:35.029558+00:00 server1 rsyslogd-pstats: action-v: processed=21 failed=0 suspended=0 suspended.duration=0 resumed=0------------------------------------------------------------------
It appears that the messages are being processed but are not arriving at the destination until several days later. All of the affected servers are running with a massive amount of idle resources available (RAM, disk space, CPU power), so I'd imagine that it's not a resource issue. Any thoughts or advice are appreciated, thank you for your assistance.
cvc69091
New
 
Posts: 2
Joined: Tue Dec 06, 2016 10:20 pm

Re: RELP messages stuck for long periods

Postby uppsalanet » Tue Dec 20, 2016 12:48 pm

I've had some RELP issues but with logstash: https://github.com/logstash-plugins/logstash-input-relp/issues/18
/Fredrik
uppsalanet
Avarage
 
Posts: 18
Joined: Thu Apr 28, 2016 9:09 am

Re: RELP messages stuck for long periods

Postby lopezjo49 » Wed Aug 09, 2017 6:57 pm

Have you found a solution to this? i too have issues forwarding from remote rsyslog servers to a master federated rsyslog server using omrelp to imrelp . I've notice that if the VPN hiccups or re-keys between them the remote buffers sometimes minutes, sometimes hours, or more. i do a ( watch -n 2 "ss -4tn" ) and can see the stale tcp connection input "Send-Q" buffer until it eventually (hopefully) times out and reconnects back-fillng the data. if you restart rsyslog you lose the data that was buffered so you must wait for timeout which makes federated logging in real time an issue.

i see others claiming same issue but no resolutions. is there something i can set? I've even tried tcp keepalive tuning to no avail.

rsyslog-gnutls-8.29.0-1.el7.x86_64
rsyslog-8.29.0-1.el7.x86_64
rsyslog-relp-8.29.0-1.el7.x86_64
rsyslog-gnutls-8.29.0-1.el7.x86_64
gnutls-dane-3.3.24-1.el7.x86_64
gnutls-utils-3.3.24-1.el7.x86_64
gnupg2-2.0.22-4.el7.x86_64
gnutls-3.3.24-1.el7.x86_64
Note: same issue with 8.28 as well.

I've tried omrelp's "timeout", "conn.timeout" but hose don't seem to work

I've tried the following queue settings without successes:
queue.timeoutshutdown="3000"
queue.timeoutactioncompletion="2000"
queue.timeoutworkerthreadshutdown="3000"

Exmple:
*.notice;cron.none;mail.none action(type="omrelp"
target="10.20.10.20" port="443"
timeout="1"
conn.timeout="1"
queue.type="LinkedList"
queue.timeoutshutdown="3000"
queue.timeoutactioncompletion="2000"
queue.timeoutworkerthreadshutdown="3000"
# queue.highwatermark="9000"
# queue.lowwatermark="200"
queue.spoolDirectory="/var/log/remote/queue"
queue.filename="q_syslog01"
queue.saveonshutdown="on"
action.resumeRetryCount="-1"
action.resumeInterval="1"
)

Federated server:
module(load="imrelp")
input(type="imrelp" port="443")
lopezjo49
New
 
Posts: 1
Joined: Wed Aug 09, 2017 6:28 pm

Google Ads



Return to Configuration

Who is online

Users browsing this forum: No registered users and 2 guests

cron