Sphinx Index Integration

Diskussions related to the development of PhpLogCon

Google Ads


Sphinx Index Integration

Postby heinemannj » Fri Jan 14, 2011 11:47 am

We have build following syslog solution for our network and security infrastructure:

rsyslog deamon
rsyslog feed into mySQL 5.5
LogAnalyzer as search frontend

This design has been working over the last years very good.
But the times are changing - today we have more than 500 messages per second - in other words more than 50.000.000 messages per day.

The query performance during the last months has become unbearable.
So we looking for better scaling solutions.

Other solutions have integrate full text indexing which query results faster than 1 sec for installations with 10K MPS (messages per second).
So I have made my own tests with Sphinx as a mySQL full text indexer - the result was unbelievable full text query result over 50.000.000 messages in 2-3 ms (milli seconds).

Is there a possibility to integrate sphinx queries as a logging source into LogAnalyzer?

The integration of the sphinx indexer and sphinxd into existing rsyslog deployments is quite easy and should work on most linux system with no problems.
The only thing that needs modifications / enhancement is the php frontend for full text search queries.
heinemannj
New
 
Posts: 3
Joined: Fri Jan 14, 2011 11:01 am
Location: 45883 Gelsenkirchen, Deutschland

Urgent Question?

  • Pulling out your Hair?
  • Wasting Time and Money?
  • Deadline Approaching?

Re: Sphinx Index Integration

Postby rgerhards » Mon Jan 17, 2011 2:18 pm

hi, this sounds like an *excellent* idea. I have briefly talked to Andre, the LogAnalyzer development lead. He also likes the idea. He hopes to be able to integrate it soon.

Rainer
rgerhards
Site Admin
 
Posts: 3806
Joined: Thu Feb 13, 2003 11:57 am

Re: Sphinx Index Integration

Postby heinemannj » Mon Jan 17, 2011 3:46 pm

Hi Rainer,

thanks for your feedback and interests on the idea - it's not my design, but the use of sphinx and full text indexing could be the new trend for queries on huge data stores for syslogging.

If I can help you - maybe as alpha tester - please let me know.
I'm not a software engineer - I'm a security and network professional with a little bit of linux experience.

I definetly can fire with 2500 network devices from different vendors and various technologies (router, switche, firewalls, webapplication firewalls, proxies, ...).

Thanks for your investigations in rsyslog and LogAnalyze - to great tools

Joerg
heinemannj
New
 
Posts: 3
Joined: Fri Jan 14, 2011 11:01 am
Location: 45883 Gelsenkirchen, Deutschland

Re: Sphinx Index Integration

Postby heinemannj » Mon Jan 17, 2011 7:29 pm

heinemannj wrote:The integration of the sphinx indexer and sphinxd into existing rsyslog deployments is quite easy and should work on most linux system with no problems.
The only thing that needs modifications / enhancement is the php frontend for full text search queries.


I've have setup the following lab to test the sphinx Indexing and search functionality:

Suse SLES 11 (x86_64) server with running
rsyslogd 3.18.3 (feeds into standard rsyslog database schema),
mySQL 5.5 (Sphinx needs mySQL >= 5.1),
phpMyAdmin,
Sphinx 0.9.9 (stable) without SphinxSE as a plugin for MySQL at this time (http://sphinxsearch.com/docs/manual-0.9.9.html) and
Apache2 with php integration

rsyslog remote.conf looks like:
Code: Select all
*.* :ommysql:127.0.01,syslogdb,dbuser,password


Sphinx configuration file sphinx.conf
- for the first try with no tuning - mabye here is much more tuning necessary:
Code: Select all
#############################################################################
## data source definition
#############################################################################

source syslog
{
   # data source type. mandatory, no default value
   # known types are 'mysql', 'pgsql', 'mssql', 'xmlpipe', 'xmlpipe2'
   type               = mysql

   #####################################################################
   ## SQL settings (for 'mysql' and 'pgsql' types)
   #####################################################################

   # some straightforward parameters for SQL source types
   sql_host            = localhost
   sql_user            = dbuser
   sql_pass            = password
   sql_db            = syslogdb
   sql_port            = 3306   # optional, default is 3306

   # main document fetch query
   # mandatory, integer document ID field MUST be the first selected column
   sql_query            = \
      SELECT ID, UNIX_TIMESTAMP(ReceivedAt) AS ReceivedAt, UNIX_TIMESTAMP(DeviceReportedTime) AS DeviceReportedTime, Facility, Priority, FromHost, Message, InfoUnitID, SysLogTag \
      FROM SystemEvents

   # unsigned integer attribute declaration
   # multi-value (an arbitrary number of attributes is allowed), optional
   # optional bit size can be specified, default is 32
   #
   sql_attr_uint         = Priority
   sql_attr_uint         = Facility

   # UNIX timestamp attribute declaration
   # multi-value (an arbitrary number of attributes is allowed), optional
   # similar to integer, but can also be used in date functions
   #
   sql_attr_timestamp      = ReceivedAt
   sql_attr_timestamp      = DeviceReportedTime

   # string ordinal attribute declaration
   # multi-value (an arbitrary number of attributes is allowed), optional
   # sorts strings (bytewise), and stores their indexes in the sorted list
   # sorting by this attr is equivalent to sorting by the original strings
   #
   # sql_attr_str2ordinal   = FromHost

   # ranged query throttling, in milliseconds
   # optional, default is 0 which means no delay
   # enforces given delay before each query step
   sql_ranged_throttle   = 0

   # document info query, ONLY for CLI search (ie. testing and debugging)
   # optional, default is empty
   # must contain $id macro and must fetch the document by that id
   sql_query_info      = SELECT * FROM SystemEvents WHERE ID=$id

#############################################################################
## index definition
#############################################################################

# local index example
#
# this is an index which is stored locally in the filesystem
#
# all indexing-time options (such as morphology and charsets)
# are configured per local index
index syslog
{
   # document source(s) to index
   # multi-value, mandatory
   # document IDs must be globally unique across all sources
   source         = syslog

   # index files path and file name, without extension
   # mandatory, path must be writable, extensions will be auto-appended
   path         = /var/lib/sphinx/syslog

   # document attribute values (docinfo) storage mode
   # optional, default is 'extern'
   # known values are 'none', 'extern' and 'inline'
   docinfo         = extern

   # memory locking for cached data (.spa and .spi), to prevent swapping
   # optional, default is 0 (do not mlock)
   # requires searchd to be run from root
   mlock         = 0

   # minimum indexed word length
   # default is 1 (index everything)
   min_word_len      = 1

   # charset encoding type
   # optional, default is 'sbcs'
   # known types are 'sbcs' (Single Byte CharSet) and 'utf-8'
   charset_type      = sbcs
}

#############################################################################
## indexer settings
#############################################################################

indexer
{
   # memory limit, in bytes, kiloytes (16384K) or megabytes (256M)
   # optional, default is 32M, max is 2047M, recommended is 256M to 1024M
   mem_limit         = 32M

   # maximum IO calls per second (for I/O throttling)
   # optional, default is 0 (unlimited)
   #
   # max_iops         = 40

   # maximum IO call size, bytes (for I/O throttling)
   # optional, default is 0 (unlimited)
   #
   # max_iosize      = 1048576
}

#############################################################################
## searchd settings
#############################################################################

searchd
{
   # hostname, port, or hostname:port, or /unix/socket/path to listen on
   # multi-value, multiple listen points are allowed
   # optional, default is 0.0.0.0:3312 (listen on all interfaces, port 3312)
   #
   # listen            = 127.0.0.1
   # listen            = 3312
   # listen            = /var/run/searchd.sock
   listen            = 0.0.0.0:3312

   # log file, searchd run info is logged here
   # optional, default is 'searchd.log'
   log               = /var/log/searchd.log

   # query log file, all search queries are logged here
   # optional, default is empty (do not log queries)
   query_log         = /var/log/query.log

   # client read timeout, seconds
   # optional, default is 5
   read_timeout      = 5

   # request timeout, seconds
   # optional, default is 5 minutes
   client_timeout      = 300

   # maximum amount of children to fork (concurrent searches to run)
   # optional, default is 0 (unlimited)
   max_children      = 30

   # PID file, searchd process ID file name
   # mandatory
   pid_file         = /var/log/searchd.pid

   # max amount of matches the daemon ever keeps in RAM, per-index
   # WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL
   # default is 1000 (just like Google)
   max_matches         = 1000

   # seamless rotate, prevents rotate stalls if precaching huge datasets
   # optional, default is 1
   seamless_rotate      = 1

   # whether to forcibly preopen all indexes on startup
   # optional, default is 0 (do not preopen)
   preopen_indexes      = 0

   # whether to unlink .old index copies on succesful rotation.
   # optional, default is 1 (do unlink)
   unlink_old         = 1

   # attribute updates periodic flush timeout, seconds
   # updates will be automatically dumped to disk this frequently
   # optional, default is 0 (disable periodic flush)
   #
   # attr_flush_period   = 900

   # instance-wide ondisk_dict defaults (per-index value take precedence)
   # optional, default is 0 (precache all dictionaries in RAM)
   #
   # ondisk_dict_default   = 1

   # MVA updates pool size
   # shared between all instances of searchd, disables attr flushes!
   # optional, default size is 1M
   mva_updates_pool   = 1M

   # max allowed network packet size
   # limits both query packets from clients, and responses from agents
   # optional, default size is 8M
   max_packet_size      = 8M

   # crash log path
   # searchd will (try to) log crashed query to 'crash_log_path.PID' file
   # optional, default is empty (do not create crash logs)
   #
   # crash_log_path      = /var/log/crash

   # max allowed per-query filter count
   # optional, default is 256
   max_filters         = 256

   # max allowed per-filter values count
   # optional, default is 4096
   max_filter_values   = 4096
}


Creating the first initial sphinx index:

Code: Select all
rcsphinx start
sphinx-indexer --all


First sphinx search over CLI on the sphinx host:

Code: Select all
sphinx-search pattertosearch


With the PHP API and test.php script you could query the sphinxd from remote:
Code: Select all
sphinxapi.php
test.php
test2.php
heinemannj
New
 
Posts: 3
Joined: Fri Jan 14, 2011 11:01 am
Location: 45883 Gelsenkirchen, Deutschland

Re: Sphinx Index Integration

Postby setevoy » Fri Apr 12, 2013 12:58 pm

Hi.

2 years no replays in that topic - so, LogAnalizer still haven't any tools to work with Sphinks?
setevoy
New
 
Posts: 2
Joined: Thu Apr 11, 2013 1:14 pm
Location: Kiev, UA

Re: Sphinx Index Integration

Postby alorbach » Mon Apr 15, 2013 9:59 am

Currently there are no plans to integrate Sphinx into Loganalyzer.
However it might be possible to extend support for it if there is interest in funding it.

best regards,
Andre
alorbach
Site Admin
 
Posts: 1627
Joined: Thu Feb 13, 2003 11:55 am

Google Ads



Return to Developer's Corner

Who is online

Users browsing this forum: No registered users and 0 guests

cron