You are here: Home Blog BreakingPoint Labs Blog

Truth in Testing: Syslog AppSim

In today's Truth in Testing article, I'll be discussing one of our Application Simulators (AppSims) which can be used to generate realistic Syslog traffic.  The BSD Syslog Protocol, which is documented in RFC 3164, describes a transport to allow networked systems to send event notification messages across the network to one or more message collectors called syslog servers.  Syslog uses UDP destined for port 514 as it's underlying transport.  It's a fairly simple protocol, but it does have some interesting bits and should make for an easily digestible example of realism in AppSim generated network traffic.

A typical local syslog message that shows up in one or more system logs looks something like this:

Mar  9 10:07:21 millstone dtrammell: This is a test message.

Here you have a time-stamp, a hostname (millstone), an entity logging the message (me), and the actual log message.  This message was generated with the "logger" command via my workstation's shell, however normally you would have an application or process creating log entries rather than users.  Syslog messages that are sent across the network to a syslog server look a bit different as they need to convey additional information from the source system.  RFC 3164 describes the overall message format:

  The full format of a syslog message seen on the wire
  has three discernible parts. The first part is called 
  the PRI, the second part is the HEADER,and the third 
  part is the MSG. The total length of the packet MUST 
  be 1024 bytes or less. There is no minimum length of 
  the syslog message although sending a syslog packet 
  with no contents is worthless and SHOULD NOT be transmitted.

It is recommended that if you have not previously read Sean's blog post about our back-end BlockLib application protocol construction library, you really should go do that now, as much of the following discussion makes extensive use of the data construction concepts described in that post.

These three message parts identified by the RFC represent the highest-layer data Blocks of the syslog message, other than of course the message's root container Block.  Each of these parts of the syslog message have their own formatting and semantics and are thus further comprised of sub-Blocks.  Without going into too much detail, this logical segmentation of the message data in progressively more granular chunks is important because it allows us to place constraints on the data, as granularly as we like, which is how the AppSim achieves generation of both realistic data when used to generate randomized syslog messages, as well as generate useful test cases when used as a fuzzer.

A quick example of where these constraints are useful is in placing a maximum 1024 byte constraint on the root message container block.  As the excerpt from RFC 3164 above indicates, the total length of any syslog message must be 1024 bytes or less; thus, we shouldn't generate a randomized syslog message any longer than that, unless of course we're fuzzing.

The PRI part is the first part of the message and essentially indicates the message's priority.  On the wire, it is three, four, or five bytes long and is formatted as a left angle bracket ('<') character followed by a number, followed by a right angle bracket ('>') character.  These characters must be 7-bit ASCII in an 8-bit field.  The number between the angle brackets represent the Priority value and must be one, two, or three digits and is a product of a function against both the Facility and Severity values.  The Facility describes the type of message whereas the Severity indicates it's importance.  How the Priority value is derived from these two values is described by the RFC as:

   The Priority value is calculated by first multiplying
  the Facility number by 8 and then adding the numerical
value of the Severity. For example, a kernel message
(Facility=0) with a Severity of Emergency (Severity=0)
would have a Priority value of 0. Also, a "local use 4"
message (Facility=20) with a Severity of Notice (Severity=5)
would have a Priority value of 165.

All of these details are important to outline here as it demonstrates just how many constraints can be placed on, and just how complicated it can be to properly generate, a simple three to five byte field.  For randomized data used as the PRI part to be realistic, it must at least be compliant with the specification.  The PRI Block that the AppSim uses to generate this data is thus itself limited to a minimum of 3 bytes, a maximum of 5 bytes, and is restricted to a character set which includes only the two angle brackets and numeric digit ASCII characters.  The PRI block is further comprised of three sub-Blocks, one for the '<' character, one for the Priority value, and a third for the '>' character.  Each of these three sub-Blocks also have their lengths and character sets appropriately constrained.

The Priority Block is a type of Block referred to as an encoding Block, as it performs an operation on it's input or sub-Blocks to generate output that is more than simply a concatenation of it's sub-Blocks.  Thus, the Priority Block makes use of two sub-blocks itself, one for Facility and one for Severity, but these are not actually included in the block tree; they are simply source material for the encoder Block.  The Facility and Severity Blocks are the two most granular Blocks used in the PRI part, and are essentially just copies of the unsigned 8-bit integer primitive Blocks with a constraint on what their randomized values are allowed to be.  The RFC provides a list of designated Facility and Severity values which are used here as the legitimate values constraint for these Blocks.

You may be considering that this is entirely over-engineered for generating a simple syslog message, especially since thus far I have only detailed part one of three which comprise a complete syslog message.  If this Block-tree were used simply for realistic data generation, you'd probably be right.  Our AppSims that are built with BlockLib however can be dual-purposed as fuzzers, and all of this meta-data is extremely important when the AppSim is used to generate fuzzing test cases, the details of which I will cover in a subsequent blog post specifically on the topic of Syslog fuzzing.

The second part of the syslog message is the Header, which consists of a specially-formatted time-stamp, a space character, a hostname, and another space character.  There's not really any fancy formatting or derived values here other than various length constraints; the RFC describes the time-stamp format in a human-readable form and the hostname is simply the name of the host that generated the syslog message, which is notably not necessarily the same host that is currently sending the message over the network.

The final part of the syslog message is the MSG, which consists of a Tag and the actual log message Content.  The Tag is the name of the program or process that generated the message limited to between 1 and 32 bytes long, inclusive.  The Content is a free-form text message, however it is commonly prefixed with the process ID (PID) related to the Tag value in the form of a left bracket ('['), the PID value, a right bracket (']'), and a colon (':').

The end result here is that the Syslog AppSim can generate randomized but still specification compliant syslog messages that appear realistic on the wire:

<19>Aug 05 10:45:50 xtics uIhTvrqbyVYMDFgBt[26]:qmxQ pWZ 3644 33Jz YdA uR H D33Kic yEgw

Of course while it is specification compliant and therefore realistic to a syslog protocol parser, interpreter, or server, it is obviously nonsensical to a human observer.  When testing these types of technologies, generated data similar to that shown above is usually fit for purpose, however in other cases more control and specificity is required.  The granular way in which the Syslog AppSim generated this message provides a number of customization options for the user who would like to fine-tune a completely randomized message like the one shown above, or create their own entirely new syslog messages.  Below are two screen-shots of the Syslog AppSim's "Flow" settings and the Syslog Message action's settings:

Syslog Flow Settings

Syslog Action Settings

As you can see, nearly all aspects of the Syslog message can be customized via these settings, from the log message's source hostname all the way down to the individual facility and severity, which consequently don't even show up in their native form on the wire.  If you were to, for example, want to create a batch of syslog traffic that appeared to be terrorist identification alerts from a hypothetical network monitoring system called "carnivore", you would want to set the Tag setting to "carnivore" and create a few syslog message actions which conveyed benign log entries, an alert log entry, and set all of their facility and severity values appropriately:

<86>Mar 09 15:20:42 lGvbIs carnivore[867]: No one here but us chickens...
<86>Mar 09 15:20:40 204.172.15.189 carnivore[6456]: Situation Normal
<81>Mar 09 15:20:47 KXMknh carnivore[29]: Terrorist Detected!!!
<86>Mar 09 15:20:36 47.127.243.250 carnivore[84]: These aren't the droids you're looking for.

Sample pcap files for further inspection are available for both the randomized syslog messages generated by the Syslog AppSim's message action using default settings, as well as my "carnivore" example using customized settings.

Posted by Dustin D. Trammell (2009/03/12 13:17:19.238 GMT-5)

White Paper: Simulating Distributed Denial-of-Service with BreakingPoint

Today we have released a new white paper that I've been working on entitled "Simulating Distributed Denial-of-Service with BreakingPoint".  This paper describes how to configure your BreakingPoint product's Network Neighborhood to simulate the traffic profile normally associated with a DDoS attack and then outlines a number of DDoS attack scenarios.  I've also provided a link below to a packaged version that includes product test cases to simulate the scenarios described in the paper.

Of the scenarios presented, there are several recent real-world analogies.  For example, the group of HTTP scenarios in the paper are similar in nature to multiple DDoS attacks that were recently launched simultaneously against our very own HD Moore's Metasploit Project website, alongside other information security and hacking related websites. You can read his ongoing commentary from during and after the attacks on the Metasploit Blog, beginning with this post.

The last scenario discussed in the paper is one of my all-time favorite DDoSes from when I was focusing a lot of my research efforts within the scope of VoIP systems and technologies.  I regularly employed the tactic outlined by this scenario to demonstrate how a DDoS attack can effectively fly under most network security devices' radar by avoiding the usual DDoS traffic model and by shifting the target of the attack from the technology itself to elsewhere.  I won't ruin the surprise here, you'll have to download the paper to find out what I'm talking about...

Finally, some of the test cases were created via scripting within the BreakingPoint TCL interface, so the paper also provides an introduction to that topic as well as the TCL scripts themselves.  Todd Manning has recently been blogging here on this topic, the posts for which you can find by browsing this blog using the "tcl" tag.

We invite you to take a look at the paper, which can be found here (PDF).  The package which includes the paper as well as supporting materials such as test cases and TCL scripts can be found here.

Posted by Dustin D. Trammell (2009/02/27 10:23:24.862 US/Central)

Ruby String Processing Overhead

Just prior to joining BreakingPoint I taught myself enough Ruby to, with considerable help from HD and Matt Miller, implement a proof-of-concept of the research that I was to present at ToorCon 9 in the Metasploit Framework.  Shortly after that I joined BreakingPoint Labs, and was thrown head-first into the world of Ruby.  One thing I noticed early-on was that strings were frequently represented using both single as well as double quotes, without much reason as to why one was chosen over the other.  After the simple tutorials and I began getting into more complex code, I found out that in Ruby, strings that are contained in double-quotes are processed for escape sequences and variable interpolation whereas strings contained in single-quotes are literal.  For example, if you have a variable in ruby, you can have the value interpolated into a string:

irb(main):001:0> name = 'Dustin'
=> "Dustin"
irb(main):002:0> puts "Hi, my name is #{name}.\n" # processed string
Hi, my name is Dustin.
=> nil
irb(main):004:0> puts 'Hi, my name is #{name}.\n' # literal string
Hi, my name is #{name}.\n
=> nil

Being primarily a C programmer, I had a preconceived notion about double versus single quotes, as in C double quotes are used for a string and single quotes are used for a character.  What originally confused me however was that many times I would see strings contained in double-quotes with nothing included in them that would cause interpolation or get interpreted when processed, essentially creating a string literal using the interpolation string construct method.  I asked HD about this, and he went on a rant about how, when migrating Metasploit from perl to Ruby for version 3.0, one of the other Metasploit developers would always complain about his use of double-quotes where not needed and claim it a performance hit.  HD maintained that there was no significant difference in the overhead between the two construct methods, but the other developer's point makes logical sense; if the string doesn't need to be processed for interpolation, it would be expected that fewer instructions would be called to handle that string.  Seeming logical, I made it a point to use single quotes for strings unless I actually needed the string's content to be processed, but lately this question has been nagging at the back of my mind and the scientist in me demanded proof.

To prove whether or not there really was any detectable, and more importantly, significant performance hit, I wrote a little Ruby script to test this.  The script first defines a string constant to use for the tests.  it then builds two test cases using the String object constructor, one for each construct method (literal versus processed), consisting of Ruby code that simply creates a String object using the constant for initialization.  The script uses the better-benchmark wrapper for rsruby (a Ruby interface to R), to measure the amount of time it takes to execute each test case via the eval() function.  This test performs 10 passes of 200,000 instances of each test case and analyzes the timings using the Wilcoxon signed-rank test.  In order to run these tests as accurately as possible I found the most under-utilized and idle system that I could so that there was less external influence on the tests and timings from other processes executing on the same system.  This happened to be a freshly installed Ubuntu 8.04 system with very little running on it.

This script produced completely unexpected results; the literal string initializations were actually slightly less efficient than their processed counterparts by about 2.6%, and R deemed this difference to be statistically significant.

These initial measurements were taken using a string of a constant length, the smallest length possible (one character) in an attempt to directly measure the overhead aggregate of the object constructor itself and the difference in processing overhead between the two different initialization methods.  Due to the initial results actually being the opposite of what I had expected, I also wanted to test if the results changed depending on the length of the initialization string value, so I then wrote a second script. The second script is similar to the first, however instead of using the same one-character initialization string for each test, the initialization string values instead increased in length, between 1 and 2000 bytes in 100 byte increments.

The second script produced slightly more expected results.  Processed strings still had a statistically significant performance benefit up to about 600 byte strings.  The difference was statistically insignificant only at around 600 to 700 bytes, and became significant again in the literal method's favor at somewhere between 700 and 800 bytes and beyond.

The important point to note is that String object constructors initialized with literal strings only seem to provide a performance benefit on longer strings and using processed strings seem to provide the performance benefit when using shorter strings. This means that there is a measurable string length threshold at which using one or the other initialization method becomes statistically significant for your project, and a window of length sizes within which it doesn't really matter all that much which method you use.  Since most programmers are unlikely to have strings of any significant length directly in their code, it would appear that using double-quotes in normal practice would be the preferred method for achieving better performance.

For some further reading on the performance difference between string variable interpolation versus using string append and concatenation methods, I found this blog post to be interesting.

Posted by Dustin D. Trammell (2009/01/22 07:00:00 GMT+0)
0 comments | Tags:

Automated Protocol Reverse Engineering

At BreakingPoint Labs, we are not only tasked with creating the exceptional content that you've come to know and love for the BreakingPoint Security component, but we're also tasked with creating the content for the AppSim component.  This content takes the form of individual Application Simulators (AppSims), one per application protocol.  These individual AppSims are essentially each their own little sub-component, generating realistic network traffic for the application protocol in question (as well as fuzzed traffic in the near future, stay tuned).  These components generate both the client and server side of the connection, and when played to the wire from the BreakingPoint appliance at really fast speeds, provide the BreakingPoint's load testing payload.

During the development of AppSims we come across the occasional undocumented or proprietary network protocol, usually during our response to supporting specific customer requests, at which point developing the AppSim becomes a little more interesting than the usual routine of poring through the protocol specification coupled with observing real systems using the protocol to determine implementation or platform nuance.  When trying to implement an AppSim for an unknown protocol, a specification to work from is simply a luxury that you don't have.  It's at this time that protocol reverse engineering comes into play.

Protocol Reverse Engineering

Protocol reverse engineering is traditionally a task done by hand, aided by your favorite network analyzer or packet sniffer.  Text-based protocols like HTTP, SIP, and SMTP that are human-readable on the wire don't require much more than the manual methods.  Packet field boundaries and field groupings are generally easily identified by common delimiter sequences such as carriage returns (0x0d), line feeds (0x0a), CRLFs (0x0d0a), white-space characters, colons, semicolons, slashes, backslashes, pairs of parenthesis and brackets, and so forth.  Text protocol packets, generally being human-readable, can usually be reversed to a fairly accurate packet format without much effort.

Once you enter the realm of binary protocols however, this discipline becomes a whole different ball game.  Binary protocols, not being meant to be read by humans but rather exclusively by machines that already know the protocol's packet structure, have none of the grammar and syntax structure of text protocols.  It is due to this lack of bloat that binary protocols are often preferred for systems that require greater throughput and less latency because binary protocol packets are generally much smaller than comparable packets found in text protocols.  Since the reverse engineer has no such clues to identify packet field groupings and individual field boundaries, much more attention must be paid to the overall collection of packets found in a protocol session and the differences between them.  Iterator fields such as sequence numbers can often be given away by their behavior as the suspected field is tracked from packet to subsequent packet.  As it iterates in value, it is easily recognizable, but questions may still arise such as if the session is short and the iterator is low in value and preceded by a number of zero bytes, what size is the actual iterator field?  Is it only the two bytes that are seen incrementing in value, suggesting a 16-bit field, or does it include the preceding two zeroed bytes, making it a 32-bit field?  Unless you have a way to influence this value directly through the software who's behavior employing this protocol is being analyzed, or can cause the software to communicate with much longer sessions so as to observe the iterator value wrap at it's upper bound or continue iterating into it's next preceding zeroed byte, you may never really know.  The two preceding zeroed bytes could very well be a two byte reserved field which is meant to always be zero.  It is these types of questions, among many others, that arise when attempting to reverse engineer a binary protocol.

In my research into this discipline I have come across a number of techniques for automating the task of protocol reverse engineering.  No one solution offers a 'silver bullet' that magically produces a protocol specification of an unknown protocol, but various automated techniques combined with manual processes can come rather close to this lofty goal if employed against a large data set of protocol traffic and with an appropriate amount of pre-processing of that data set.

Protocol Informatics

One tool that I've added to my protocol reversing toolbox is PI, the prototype reference implementation for the Protocol Informatics Project by Marshall Beddoe.  The tool is now a bit dated, but does work well in some situations.  The general idea of Protocol Informatics is to apply bioinformatics algorithms to network traffic.  The algorithms that are used perform sequence alignment on a series of packet samples to better understand the underlying structure, similar to the way relationships between two sequences of genetic information such as DNA or amino acids are.

First, the Smith Waterman algorithm is used to sort packets from the data set into groups of comparable packets based on a similarity score.  Then, the Needleman Wunsch algorithm is applied to the packets within each group to globally align them and attempt to identify packet format structure through static values and differences, as well as variable length fields by where gaps had to be inserted into various packets in order to align them.  You can find the full whitepaper detailing this technique via the Protocol Informatics Project website.

Because this technique essentially relies on identifying similarities and differences between individual packets in a group of similar packets, it works well against small binary protocols with tightly-packet data structures with a small amount of wasted bits, such as ICMP.  It also works fairly well against text-based protocols as there are many instances of static data found within them such as header field names and delimiters.  Where this approach does not work well is against larger binary protocols that have a lot of wasted space in them such as empty fields reserved for future use or large sized fields used for small values, such as a 32-bit integer field used for a boolean value which will only ever be a 0 or 1.  When faced with these large swaths of zeroed bytes, it is extremely difficult to tell where the real field boundaries are.

Protocol DeBugger (PDB)

One tool that I used to find useful when working within another security discipline is the Protocol DeBugger (PDB) by Jeremy Rauch.  PDB operates like the unholy offspring of a network proxy and an application debugger.  By passing network traffic through it like a transparent network proxy you are able to set breakpoints on specific packets or events, break and inspect individual packets, modify them if desired, and then continue to send them on and proxy subsequent packets as desired.

If I recall correctly, PDB also performed some cursory attempts at identifying packet structure by tracking changing values, such as those that appear to be iterating for example.  Unfortunately I haven't used this tool in a number of years and have been so far unsuccessful at getting it built and functioning on a current BSD system (it's integrated with and uses ipfw redirection), so I can't verify that it actually did this.  At any rate, even if it doesn't, it would still be useful for live protocol analysis which cain definitely aid in protocol reverse engineering, so I included it here for completeness.

The primary downside to this tool is that it was originally developed to aid in protocol fuzzing, and as such works on live traffic as it traverses the tool which acts as a transparent proxy.  In this manner it was used to manipulate the packets to perform fuzzing against either side of the connection.  If you're working primarily with packet captures, it takes a bit of extra effort to replay your packet captures through it.

While it seems that this tool is currently unmaintained as I wasn't able to find a current reference URL, I have a copy that I was able to obtain shortly after Jeremy Rauch's presentation on PDB at BlackHat 2006, and you can grab the same version from the web via the ever-so-useful wayback machine.  Note, this is NOT the same as this PDB, which is an entirely different tool.

Discoverer

Finally, one research project that seems promising is Discoverer from Microsoft Research.  The paper linked here claims some improvements over the technique employed by Protocol Informatics, however the paper's author has indicated that there are no plans to publicly release any implementation code, and I haven't personally had the time to attempt an implementation from the details in the white paper.

Application Analysis

There are also a few techniques that attempt to build a protocol specification not by reverse engineering the protocol as it is seen on the wire but by both dynamic and static analysis of the software that constructs and sends that data.  Obviously this involves applying reverse engineering or debugging techniques to the software itself, and as such these techniques may run afoul of your software end-user license agreement or be outright illegal in your country.

Manual Reverse Engineering

At the end of the day you are likely to still be doing a significant amount of reverse engineering manually, however employing one or more of the automated tools and techniques prior to this undertaking can certainly clear away some of the low-hanging fruit and give you some momentum in the correct direction.  Even beginning with a loosely defined packet structure definition is likely better than beginning from scratch with a collection of raw hex dumps of various packets.

Posted by Dustin D. Trammell (2009/01/13 18:00:00 GMT+0)
1 comments | Tags:

Multimedia Message System (MMS) MM1 Protocol

In the most recent StrikePack we included an AppSim to support the Multimedia Messaging Service (MMS) MM1 protocol.  MM1 is the 3GPP interface between the MMS User Agent, which generally resides on the Mobile Station (MS), and the MMS Center (MMSC), and is used for sending and retrieving Multimedia Messages to and from the MMSC and for managing the subscriber's Multimedia Mailbox (MMBox) on the MMSC.  This is essentially how your cellular phone sends and retrieves picture or video messages.

Having worked for nearly my entire tenure at Sipera Systems with cellular protocols (focused on dual-mode VoIP phones), I was already familiar with the overall architecture but I had not had the opportunity while there to dig into the details of the MMS system.  Overall, the MMS system encapsulates eleven different protocols, aptly named MM1 through MM11.  The AppSim included in the most recent StrikePack implements some basic support for the first one, MM1.

MM1, as mentioned above, handles communication for sending and retrieval of media messages via the MMSC, which houses the user's MMBox.  This communication protocol is request and response based and can be sent over either of two transports, the Wireless Session Protocol (WSP, cellular network) or HTTP (data network).  Our AppSim currently only supports transport over HTTP as this is what is more likely to be found on a data network.  In both cases, MMS Protocol Data Units (PDUs) are transported as a content-type of 'application/vnd.wap.mms-message'.  If the message contains media, it is transported as a content-type of 'application/vnd.wap.multipart.related' and is structured per RFC-2387.

The bit about MM1 that I found rather interesting is that, intended to be used over cellular networks, a fairly diligent effort was made to reduce the size of messages as much as possible even though the protocol was seemingly based on the Internet Message Format (IMF, RFC-2822) which is a generalized text (specifically, 7-bit US-ASCII) data messaging format resembling SMTP messages or HTTP responses with single-line header fields and an optional body, or payload, portion.  The size reduction is accomplished using a portion of the Wireless Application Protocol (WAP) standard called Wireless Session Protocol (WSP).  The WSP protocol provides a utility for encoding well-known IMF headers into much shorter, binary representations usually consisting of a single 8-bit integer value called the 'Type' which represents the header name and a Type-specific value of one or more of six primitive data types of 'bit', 'octet', 'uint8', 'uint16', 'uint32' and 'uintvar' which represent the header's value.

'uintvar' is a variable length unsigned integer, itself with it's OWN special type of encoding which consists of taking your normal integer value of whatever size, splicing it up into groups of 7 bits and putting those into subsequent octets using the eighth and most significant bit of each octet as a 'continue' indicator to the decoder so that while decoding it knows the integer value continues with more data in the next octet.  A pretty slick way to do it, but very much unreadable to a human on the wire and a bit of a pain to implement.  I definitely had to dust off my bitwisdom to tackle that one.

Due to there apparently not being an existing Ruby library for encoding/decoding WSP, I essentially had to code most of this up myself from scratch using the specification and a few example pcaps.  Because each header and it's value has it's own, often unique encoding, even for only the headers that we currently support it was a fairly tedious task and required quite a bit of testing and bug fixing.  Luckily Wireshark does have a dissector for MMS (filter for mmse) and was able to lend a hand parsing the development AppSim's output as I built it.  This code is essentially what translates the header setting values that our users provide to the AppSim into the proper binary-encoded header block of data used in the configured action's associated message on the wire.

The BreakingPoint MMS-MM1 AppSim currently supports four actions equating to two types of MM1 messages; Raw Request, Raw Response, Raw Body Request, and Raw Body Response.  The first two require only that the user upload two files, each containing the message to be sent's raw header data and raw body data.  These are ideal for use in replicating messages from a source pcap or other data capture.  The second two require the user upload a single file, the message to be sent's raw body data and have a number of settings which when configured will generate a properly encoded MM1 header to append the supplied raw body data to.

Look for AppSims for both WSP and the remaining MMS protocols (MM2-MM11) in future StrikePack updates!

Posted by Dustin D. Trammell (2008/12/05 16:07:00.789 US/Central)
0 comments | Tags:

<<previous posts