RegEx POSIX limitation problem - positive lookbehind

This is the place for you, if you got rsyslog up and running but wonder how to make it do what you want.

Moderator: rgerhards

RegEx POSIX limitation problem - positive lookbehind

Postby t.reisinger » Sun Sep 28, 2008 3:33 pm

Hi,

I want to extract several values from a syslog message with RegEx. I was able to create the correct RegEx using 'positive lookbehind' to get for example:

Syslog message
Code: Select all
CallLegType 2, ConnectionId C50604798C2E11DDA3F9C203B891A4DE, SetupTime 21:53:59.301 ADT Sat Sep 27 2008, PeerAddress 400, PeerSubAddress , DisconnectCause 10, DisconnectText normal call clearing (16), ConnectTime 21:53:59.351 ADT Sat Sep 27 2008, DisconnectTime 21:54:02.311 ADT Sat Sep 27 2008, CallOrigin 1, ChargedUnits 0, InfoType 2, TransmitPackets 146, TransmitBytes 23360, ReceivePackets 134, ReceiveBytes 21281

Code: Select all
(?<=CallLegType )[0-9]*   CallLegType     finds 2
(?<=ConnectionId )\w*     ConnectionId    finds C50604798C2E11DDA3F9C203B891A4DE


I used
Code: Select all
extract: '%msg:R:(?<=CallLegType )[0-9]*--end%','%msg:R:(?<=ConnectionId )\w*--end%')
in rsyslog and it loaded find (was recognized in debug mode). In the database/text log I see only NO MATCH. I believe it's a problem with the rsyslog supported RegEx POSIX engine limitation. Can somebody confirm this?

Unfortunately I'm not the RegEx guru and want to ask, if there is a way to modify my positive lookbehind to conform POSIX expression?

Fields are not the right tool to extract the values because I have the field description AND value between the commas:

Code: Select all
ChargedUnits 0, InfoType 2, TransmitPackets 146,


I would really appreciate your feedback/support.

Thomas
t.reisinger
New
 
Posts: 9
Joined: Mon Sep 22, 2008 12:09 pm

Professional Services Information

  • Custom written rsyslog.conf?
  • Maintenance Contract?
  • Installation support?

Re: RegEx POSIX limitation problem - positive lookbehind

Postby rgerhards » Mon Sep 29, 2008 10:43 am

sorry, I am not a regex guy either, but I think you need ERE expressions, not BRE ones (the default). That would be R,ERE. For the full syntax, please read here:

http://www.rsyslog.com/doc-property_replacer.html

I'd appreciate if you let me know if that works and, if so, via which directive (that would be useful for others too).

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: RegEx POSIX limitation problem - positive lookbehind

Postby t.reisinger » Tue Sep 30, 2008 1:32 am

Hi,

I verified the POSIX standard and either ERE or BRE supports positive lookbehind. To be sure I tested it:
Code: Select all
'%msg:R,ERE,1,DFLT:(?<=CallLegType )[0-9]*--end%','%msg:R,ERE,1,DFLT:(?<=ConnectionId )\w*
--end%')


NO MATCH :(
t.reisinger
New
 
Posts: 9
Joined: Mon Sep 22, 2008 12:09 pm

Re: RegEx POSIX limitation problem - positive lookbehind

Postby rgerhards » Tue Sep 30, 2008 7:37 am

I am sorry, I just call the API - maybe it does not fully implement everything that Posix specifies? I do not specifically turn anything off.
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: RegEx POSIX limitation problem - positive lookbehind

Postby 4drian » Mon Nov 09, 2009 4:45 am

Hi Rainer,

I too am trying to use regex functionality that includes lookbehinds to grab a bunch of name-value pairs in the message.

In the messages I have a collection of:

name1=value1 name2=value2 name3=value3 etc

and I am trying to populate a MySQL table where the columns are labelled as name1, name2, name3 etc and the relevant values are put into their respective columns. Each name-value pair in this case is simply separated by a space. As with the original poster, I don't want to use fields as I need to extract the value part only and not the full name-value pair.

I'm currently debugging using a plain text file to simplify matters and I'm using a regex that looks like:

Code: Select all
(?<=\sname2=).*?(?=\s)

Which is in a simple template as

Code: Select all
%msg:R,ERE,0,BLANK:(?<=\sname2=).*?(?=\s)--end%

This should pick out the value associated with "name2" and ensures that the name-value pair is preceded by a space and followed by another space.

I'm not 100% sure with my regexes but I've tried much simpler ones and I just can't get them to work with the lookbehind functionality. One thing that I am stumped with is the extra set of parentheses which I think should mean capturing a submatch. Everything works great with other regexes which suggests that the issue is not with any other rsyslog configuration.

I'm really getting out of my comfort zone here but when you say "I just call the API" does this have anything to do with a specific library installed on my computer? I'm currently using Ubuntu 9.10 and the only library installed that is specifically regex related is libpcre3. From what I can gather, this should allow other programs to use the full PCRE regex functionality which includes lookbehind (and lazy quantifiers which also don't seem to work).

Is there a way to specify what regex library to use on compile or in a config file? Is there another way to grab these values out of the name-value pairs? Is the best bet to pay for a custom message parser?
4drian
New
 
Posts: 4
Joined: Thu Nov 05, 2009 1:56 am

Re: RegEx POSIX limitation problem - positive lookbehind

Postby rgerhards » Mon Nov 09, 2009 9:43 am

PCRE is not POSIX RE! The library I call is just the regular clib. Of course, we could add other regex libraries, but this needs to be done :(

Does your excerpt work with the web-based regex checker tool? If yes, it should also work with rsyslog. If not, the expression probably is not POSIX extended regular expression.

Rainer
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: RegEx POSIX limitation problem - positive lookbehind

Postby rgerhards » Mon Nov 09, 2009 9:46 am

Oh, I forgot to mention. The message itself is well-formed, so a custom parser does (currently) not help [if we extend support for structured data, being planned, it would]. But if you could sponsor some development, we could specifically add support for either pcre or name-value pairs. I could try to trim this to a lowest solution so that it hopefully would fit within the same amount of work as a custom parser.
User avatar
rgerhards
Site Admin
 
Posts: 2647
Joined: Thu Feb 13, 2003 11:57 am

Re: RegEx POSIX limitation problem - positive lookbehind

Postby 4drian » Mon Nov 09, 2009 1:01 pm

Hi Rainer,

Thanks for your quick reply to both of my queries. I started to look at some of the examples that others had used and realised that my Perl style regex was clearly not what others were using and after a bit of further digging discovered some info about POSIX regex syntax. Your response has confirmed my suspicions. Not coming from a programming (or Linux) background, my knowledge of regex has been from working with network equipment (proxies/firewalls) where it has all been Perl style. I never knew that there were any other types!

I've started to re-learn what I need in POSIX style and so far have been able to get the values by using basic matches like:

Code: Select all
%msg:R,ERE,1,BLANK:name2=([A-Za-z0-9_]*)--end%

Only just starting so I need to tighten up the matches but using the submatch captures the value only and not just the entire regex match - perfect :D .

I think that any sponsorship of work would go towards compression/decompression of RELP using zlib/LZO. Let's see how far I get with the basics first.

Many thanks again,

Adrian
4drian
New
 
Posts: 4
Joined: Thu Nov 05, 2009 1:56 am

Google Ads



Return to Configuration

Who is online

Users browsing this forum: No registered users and 0 guests

cron