In today's Truth in Testing article, I'll be discussing one of our Application Simulators (AppSims) which can be used to generate realistic Syslog traffic. The BSD Syslog Protocol, which is documented in RFC 3164, describes a transport to allow networked systems to send event notification messages across the network to one or more message collectors called syslog servers. Syslog uses UDP destined for port 514 as it's underlying transport. It's a fairly simple protocol, but it does have some interesting bits and should make for an easily digestible example of realism in AppSim generated network traffic.
A typical local syslog message that shows up in one or more system logs looks something like this:
Mar 9 10:07:21 millstone dtrammell: This is a test message.
Here you have a time-stamp, a hostname (millstone), an entity logging the message (me), and the actual log message. This message was generated with the "logger" command via my workstation's shell, however normally you would have an application or process creating log entries rather than users. Syslog messages that are sent across the network to a syslog server look a bit different as they need to convey additional information from the source system. RFC 3164 describes the overall message format:
The full format of a syslog message seen on the wire has three discernible parts. The first part is called the PRI, the second part is the HEADER,and the third part is the MSG. The total length of the packet MUST be 1024 bytes or less. There is no minimum length of the syslog message although sending a syslog packet with no contents is worthless and SHOULD NOT be transmitted.
It is recommended that if you have not previously read Sean's blog post about our back-end BlockLib application protocol construction library, you really should go do that now, as much of the following discussion makes extensive use of the data construction concepts described in that post.
These three message parts identified by the RFC represent the highest-layer data Blocks of the syslog message, other than of course the message's root container Block. Each of these parts of the syslog message have their own formatting and semantics and are thus further comprised of sub-Blocks. Without going into too much detail, this logical segmentation of the message data in progressively more granular chunks is important because it allows us to place constraints on the data, as granularly as we like, which is how the AppSim achieves generation of both realistic data when used to generate randomized syslog messages, as well as generate useful test cases when used as a fuzzer.
A quick example of where these constraints are useful is in placing a maximum 1024 byte constraint on the root message container block. As the excerpt from RFC 3164 above indicates, the total length of any syslog message must be 1024 bytes or less; thus, we shouldn't generate a randomized syslog message any longer than that, unless of course we're fuzzing.
The PRI part is the first part of the message and essentially indicates the message's priority. On the wire, it is three, four, or five bytes long and is formatted as a left angle bracket ('<') character followed by a number, followed by a right angle bracket ('>') character. These characters must be 7-bit ASCII in an 8-bit field. The number between the angle brackets represent the Priority value and must be one, two, or three digits and is a product of a function against both the Facility and Severity values. The Facility describes the type of message whereas the Severity indicates it's importance. How the Priority value is derived from these two values is described by the RFC as:
The Priority value is calculated by first multiplying the Facility number by 8 and then adding the numerical value of the Severity. For example, a kernel message (Facility=0) with a Severity of Emergency (Severity=0) would have a Priority value of 0. Also, a "local use 4" message (Facility=20) with a Severity of Notice (Severity=5) would have a Priority value of 165.
All of these details are important to outline here as it demonstrates just how many constraints can be placed on, and just how complicated it can be to properly generate, a simple three to five byte field. For randomized data used as the PRI part to be realistic, it must at least be compliant with the specification. The PRI Block that the AppSim uses to generate this data is thus itself limited to a minimum of 3 bytes, a maximum of 5 bytes, and is restricted to a character set which includes only the two angle brackets and numeric digit ASCII characters. The PRI block is further comprised of three sub-Blocks, one for the '<' character, one for the Priority value, and a third for the '>' character. Each of these three sub-Blocks also have their lengths and character sets appropriately constrained.
The Priority Block is a type of Block referred to as an encoding Block, as it performs an operation on it's input or sub-Blocks to generate output that is more than simply a concatenation of it's sub-Blocks. Thus, the Priority Block makes use of two sub-blocks itself, one for Facility and one for Severity, but these are not actually included in the block tree; they are simply source material for the encoder Block. The Facility and Severity Blocks are the two most granular Blocks used in the PRI part, and are essentially just copies of the unsigned 8-bit integer primitive Blocks with a constraint on what their randomized values are allowed to be. The RFC provides a list of designated Facility and Severity values which are used here as the legitimate values constraint for these Blocks.
You may be considering that this is entirely over-engineered for generating a simple syslog message, especially since thus far I have only detailed part one of three which comprise a complete syslog message. If this Block-tree were used simply for realistic data generation, you'd probably be right. Our AppSims that are built with BlockLib however can be dual-purposed as fuzzers, and all of this meta-data is extremely important when the AppSim is used to generate fuzzing test cases, the details of which I will cover in a subsequent blog post specifically on the topic of Syslog fuzzing.
The second part of the syslog message is the Header, which consists of a specially-formatted time-stamp, a space character, a hostname, and another space character. There's not really any fancy formatting or derived values here other than various length constraints; the RFC describes the time-stamp format in a human-readable form and the hostname is simply the name of the host that generated the syslog message, which is notably not necessarily the same host that is currently sending the message over the network.
The final part of the syslog message is the MSG, which consists of a Tag and the actual log message Content. The Tag is the name of the program or process that generated the message limited to between 1 and 32 bytes long, inclusive. The Content is a free-form text message, however it is commonly prefixed with the process ID (PID) related to the Tag value in the form of a left bracket ('['), the PID value, a right bracket (']'), and a colon (':').
The end result here is that the Syslog AppSim can generate randomized but still specification compliant syslog messages that appear realistic on the wire:
<19>Aug 05 10:45:50 xtics uIhTvrqbyVYMDFgBt[26]:qmxQ pWZ 3644 33Jz YdA uR H D33Kic yEgw
Of course while it is specification compliant and therefore realistic to a syslog protocol parser, interpreter, or server, it is obviously nonsensical to a human observer. When testing these types of technologies, generated data similar to that shown above is usually fit for purpose, however in other cases more control and specificity is required. The granular way in which the Syslog AppSim generated this message provides a number of customization options for the user who would like to fine-tune a completely randomized message like the one shown above, or create their own entirely new syslog messages. Below are two screen-shots of the Syslog AppSim's "Flow" settings and the Syslog Message action's settings:


As you can see, nearly all aspects of the Syslog message can be customized via these settings, from the log message's source hostname all the way down to the individual facility and severity, which consequently don't even show up in their native form on the wire. If you were to, for example, want to create a batch of syslog traffic that appeared to be terrorist identification alerts from a hypothetical network monitoring system called "carnivore", you would want to set the Tag setting to "carnivore" and create a few syslog message actions which conveyed benign log entries, an alert log entry, and set all of their facility and severity values appropriately:
<86>Mar 09 15:20:42 lGvbIs carnivore[867]: No one here but us chickens...
<86>Mar 09 15:20:40 204.172.15.189 carnivore[6456]: Situation Normal
<81>Mar 09 15:20:47 KXMknh carnivore[29]: Terrorist Detected!!!
<86>Mar 09 15:20:36 47.127.243.250 carnivore[84]: These aren't the droids you're looking for.
Sample pcap files for further inspection are available for both the randomized syslog messages generated by the Syslog AppSim's message action using default settings, as well as my "carnivore" example using customized settings.
Tags: Tech Talk //