About regular expressions

Regular expressions defined in the InputSettings > RegExps element of the Feed Service configuration file are used to parse incoming events and extract information to be checked in feeds and to be used in outgoing events.

The original Feed Service configuration file of the distribution kit contains regular expressions that correspond to the format of the events used in the verification test.

After the verification test is performed, you may have to add some new regular expressions or change existing ones for use with specific event source software. For examples of regular expressons to be used for parsing events issued by popular devices, see section "Regular expressions for popular devices".

It is recommended that you set regular expressions for extracting such data as the IP address and port of the event source and of the event target, the user name, and the date. Use these regular expressions to define the format of the outgoing events (OutputSettings > EventFormat).

When you add, remove, or rename regular expressions, make sure that you also change the name of the regular expression in the input_regexp_to_match attribute of the Feeds > Feed > Field elements. You may also have to change OutputSettings > EventFormat, which might use the names of the regular expressions, and the configuration files of the event target software.

Specifying event sources

The regular expressions specified in the configuration file are grouped by event sources represented by Source elements. Usually these event sources are devices that issue events which afterwards are checked by Feed Service. Every Source element contains a set of regular expressions. There can be one or more Source elements in the InputSettings > RegExps element.

A Source element has the following attributes:

The way of how Feed Service chooses regular expressions from different Source elements is described in the following flow chart.

sources_flowchart

Choosing a regular expression

The obsolete configuration file format does not require the specification of event sources. The regular expressions contained in such a file are treated as those of the default event source.

The regular expressions of the default event source for finding URLs, IP addresses, and hashes are universal, that is, they can be used for parsing events issued by most devices. They can be used for parsing events that contain multiple URLs, but cannot be used, for example, for parsing events that contain URLs with no protocol specified. The use of universal regular expressions lowers the performance of Feed Service compared to use of device-specific regular expressions. Also, the universal regular expressions do not handle the dispersal, in an event, of different parts of a URL (for example, the host and the path). The universal regular expressions for finding hashes can extract symbol sequences that actually are not hashes.

Defining regular expressions

You can use any name for a regular expression except the following ones:

Regular expression parameters

In the configuration file you can set parameters for regular expressions. The parameters are set in attributes of XML elements. Two attributes are supported: concatenate and extract.

Compound values

The concatenate attribute is used to set a rule for creating a compound value from data extracted from an event. A rule refers to groups of extracted data by means of #N symbols, where N is the number of a group (starting from 1). If a backslash (\) precedes the hash symbol (#), the latter is not used in the number of a group; instead, the # is treated merely as a number sign.

The following example event is parsed:

url_1=http://domain test_event url_2=/page/mypage test

The regular expressions used and the results of parsing of the example event are provided in the table below.

Examples of applying regular expressions

Regular expression

Result of parsing

<RE_URL concatenate="#1#2">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

http://domain/page/mypage

<RE_URL concatenate="#2#1">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

/page/mypagehttp://domain

<RE_URL concatenate="#2_/_#1">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

/page/mypage_/_http://domain

If no concatenation rule is set or the value of the concatenate attribute is empty, and the regular expression contains more than one group, the values of the groups are concatenated in the order they appear in the regular expression.

If the concatenate attribute contains more groups than the regular expression contains, the extra groups will be ignored and will be substituted with the corresponding #N text.

Event being parsed:

url_1=http://domain test_event url_2=/page/my_page test

Regular expression used:

<RE_URL concatenate="#1#2#3">url_1=(.*?)\stest_event\surl_2=(.*?)\stest</RE_URL>

Result of parsing:

http://domain/page/my_page#3

Multiple matching

When parsing an event by using a regular expression, it is possible to extract all values that match the regular expression. For this purpose, set the value of the extract attribute to "all". If this value is set to "first" or the attribute is not specified, only the first value that matches the regular expression will be extracted.

For every matched value a separate detection event is generated. If the detection process does not affect a certain event field, the value of this field in the output event is set to a hyphen (-).

Event being parsed:

ip1=12.12.12.12 ip2=23.23.23.23 hash1=abc hash2=cde user1=N1 user2=N2

Configuration file elements:

<RegExps>

<Source id="default">

<RE_IP extract="all">...</RE_IP>

<RE_HASH extract="all">...</RE_HASH>

<RE_USER extract="first">...</RE_USER>

</Source>

</RegExps>

<EventFormat>ip=%RE_IP% hash=%RE_HASH% user=%RE_USER% %FeedContext%</EventFormat>

Available feed records:

IP = 12.12.12.12

IP = 23.23.23.23

hash = cde

Detection events generated:

ip=12.12.12.12 hash=- user= N1 <context for 12.12.12.12>

ip=23.23.23.23 hash=- user= N1 <context for 23.23.23.23>

ip=- hash=cde user=N1 <context for cde>

Specifying characters by their hexadecimal code

Feed Service uses regular expressions that conform to PCRE syntax. This syntax allows specifying a character by its code in several ways.

Feed Service does not support specifying a character in \x{hhh..} format. Instead, specify a character by its code in the following way: \uhhhh, where hhhh is the hexadecimal code of the character. For example, you cannot use a ([\x{00a1}-\x{ffff}]) expression, but you can use a ([\u00a1-\uffff]) expression.

Page top