

Next week is my favorite week of the year. It's the Sales Operations meetings held at our headquarters in Austin. Each year we bring the sales people and sales engineers together to review the previous year and preview the year moving forward. More importantly I get to show off.
2009, from all facets, was an incredible year here at BreakingPoint. Sales had an amazing year, with huge growth. Our employee base grew by nearly 30%, much of that being our heavy investment in the security group. We put out 3 major releases and 3 minor releases of our firmware for the BreakingPoint Elite. And our application protocol list now tops more than 100 and our strikes are over 4,300.
This news is certainly exciting, but that was last year. And this is a completely new year and we are ramping up in engineering like you could not imagine. The next firmware release will once again improve the performance of everything from our application protocols, security engine and our SSL. And, of course this is all done without having to replace your blades and at no extra cost. Bet your other vendors don't say that every year.
Next month I'll be putting together a screencast showing you all the features in our next release. I'll save all the juicy bits for then, but here is a teaser of what to expect:
Last year we changed the way people test their network equipment, this year we will set the standard.
Reminds me of when I worked at Cisco many years ago and Kevin Kennedy (Vice President) would show a slide in which Cisco was compared to other similar companies. There must have been 30 companies listed and at the time 3Com was below us, Lucent ahead of us and all the way at the top were companies like HP. At the time HP was 10x the size of Cisco. Today, Cisco is tens of billions of dollars ahead of HP, with a third of the employees.
Every year that presentation showed Cisco passing yet another company. We have the same chart for our industry and the same goals, and some companies were ahead of us at the beginning of 2009. During 2009 we passed four of them and this year we will pass four more. And one day, like Cisco, we'll be at the top of everyone else's list.
NOTE: Sometimes Cisco didn't pass a company, the company fell. I'm seeing a lot of that lately, maybe I should send some flowers.
I am an RTL designer that found my home at BreakingPoint doing Hardware design. After the initial checkout of our 8 port, 1Gig blades for the BreakingPoint Elite chassis, I moved over to our software team to work on some of the lower layer code, such as our phy drivers, and really get the 1Gig board up and running enough for our network processor designers to start running some real traffic on these boards.
Having valuable experiance in FPGA design, board design, and now even software design, I thought I would share an RTL debug experience that may hopefully give your RTL designers some food for thought.
When debugging an RTL problem, there are 2 big steps in getting to a solution; recreate the problem in the lab, and ultimately recreating the problem in simulation. Once we can recreate the problem in a simulation, it almost always becomes a mere matter of implementation from that point. We can make the changes in the Verilog, rerun simulations to verify the fix, rerun regression simulations, and finally head back to synthesis.
Recently, one of our FPGA designers was tracking down an issue which we were able to catch in the PCAP files, but we were having trouble triggering on the error with a Logic Analyzer. So, the thought occurred to me: what if we were able to take the traffic we captured and play it back directly into the FPGA simulation?
In the following example I will show how we were able to do just that.
Lets say you are in the lab running a "Session Sender" test with "Stack Scrambler" enabled, and you find that you are dropping sessions. You can export a PCAP from BreakingPoint Elite to see the traffic generated:
This test generated more then 700MBytes of Egress traffic. I am going to skip over the first 1000 frames, and export the next 200 frames. I want to grab the Transmit data only, because at this point, I am only interested in seeing the traffic that was sent into the DUT (read more on traffic filtering):

Click on Image for Full View
Here is a quick look at the traffic in Wireshark:

Click on Image for Full View
Now we can see the traffic that was generated by the Session Sender and Stack Scrambler. This is great, but what I really wanted to do was inject this traffic directly into an FPGA simulation. There are many different ways to accomplish this:
The task of reading a PCAP file seemed like something that would be much more easily done in a software language like C, especially considering I would have access to libpcap which already has everything I need to read PCAP files. Although I had never used PLI before, I did a little web research and found a few examples to get me started in the right direction. If you are a SystemVerilog user, I would recommend you also look into the Direct Programming Interface (DPI).
If you have never seen or used PLI before, it essentially provides a method of calling C functions from a Verilog simulation, and allows passing signals into and out of the C code. The signal will get translated to an integer or real value in the C code. You will typically have a C function that you call from the Verilog, which will initialize all of your data in the C environment. In that initialization function, you can configure other functions to be called on signal events. In my case, I used a Verilog module to generate my XGMII clocks, and resets. I call my PLI initialization function and pass it my XGMII clock, reset signal, XGMII data bus, and XGMII command bus. The C code sets up pointers to each of these signals and configures a function to call on each transition of my XGMII clock.
You may note that I also pass in a boolean value that tells me whether play back is fast mode, or real time mode. This allows me to skip all of the timestamps in the PCAP file and play the packets back to back as quickly as possible, or to use the timestamps to help recreate the actual timing of the packets.
Here is an example of my PLI initialization code:
struct pli_play_data {
wtap *pcap;
char *tfinst_p;
char filename[512];
frame_info_t fi;
int64_t time_ref;
uint32_t pktptr;
uint32_t pktcount;
uint8_t fast_play;
uint8_t ipg_bits;
eth_play_states state;
handle eth_clk;
handle eth_data;
handle eth_cmd;
handle reset_n;
uint32_t last_reset_n;
handle pcap_done;
};
static PLI_INT32 pli_pcap_playback() {
struct pli_play_data *pcap_data;
acc_initialize();
//allocate memory for this instance of playback
//we may be playing multiple PCAPs back in the same
//simulation, so we need to be sure the data is kept
//seperated.
pcap_data = malloc(sizeof(struct pli_play_data));
//Get handles to the Verilog signals
pcap_data->eth_clk = acc_handle_tfarg(PLI_CLK); // XGMII clock
pcap_data->eth_cmd = acc_handle_tfarg(PLI_COMMAND); // XGMII Command
pcap_data->eth_data = acc_handle_tfarg(PLI_DATA); // XGMII data
pcap_data->reset_n = acc_handle_tfarg(PLI_RST_N); // Reset
//store the previous value of reset so we can detect rising or falling edges
pcap_data->last_reset_n = pli_get_uint32(pcap_data->reset_n);
// this is a status signal driven from the C code to tell Verilog that
// playback is finished.
pcap_data->pcap_done = acc_handle_tfarg(PLI_PCAP_DONE);
pli_set_signal(pcap_data->pcap_done, 0); // set signal low
// Boolean value to tell us whether to play the PCAP back at full speed or
// to obey the timestamps in the PCAP file.
pcap_data->fast_play = pli_get_uint32(acc_handle_tfarg(PLI_FAST_PLAY));
if (pcap_data->fast_play)
printf("Using fast play\n");
else
printf("Play at normal speed\n");
// Set the initial state of the Playback State machine. This is a
// state machine in the PCAP playback C code.
pcap_data->state = PLAY_IDLE;
// Get the PCAP filename and open it.
strncpy(pcap_data->filename, tf_strgetp(PLI_FILENAME, 'b'), 512);
pcap_data->pcap = pc_pcap_open(pcap_data->filename);
// call the function "pli_play_data" on every edge of the XGMII clock signal
acc_vcl_add(pcap_data->eth_clk, pli_play_data, (PLI_BYTE8 *)pcap_data, vcl_verilog_logic);
acc_close();
return 0;
}
Many of the examples I found on the web used global variables for data storage. This does not work if you plan to playback multiple PCAP files in the same simulation. So, instead I reallocate memory for all of my pcap_data structure each time I call the initializer. Then I pass that data to my worker function in the line:
acc_vcl_add(pcap_data->eth_clk, pli_play_data, (PLI_BYTE8 *)pcap_data, vcl_verilog_logic);
Here is an example of the Verilog code to call this function:
reg pcap_cap_done;
parameter FAST_PLAY = 1; // play back capture file as fast as possible
parameter REAL_PLAY = 0; // play back capture file using timestamps in pcap file
$pli_pcap_playback ("export.pcap", // input playback pcap filename
pcap_sim_TB.XGMII_1_RX_CLK, // input Ethernet Transmit clock
pcap_sim_TB.XGMII_1_RXC, // output Ethernet Command [3:0]
pcap_sim_TB.XGMII_1_RXD, // output Ethernet Data [31:0]
pcap_sim_TB.pcap_play_done, // output indicates pcap playback completed
pcap_sim_TB.RST_N, // input active low reset
REAL_PLAY // input fast play boolean
);
This gets us all set up to do the playback. The real heavy lifting is done inside of the function "pli_play_data". Here is an abbreviated version of that file (some details removed for space):
static PLI_INT32 pli_play_data(p_vc_record vc_record) {
int32_t sim_time_low;
int32_t sim_time_high;
int64_t sim_time;
uint32_t txc, txd;
struct pli_play_data *pcap_data;
acc_initialize();
pcap_data = (struct pli_play_data*)vc_record->user_data;
sim_time_low = tf_getlongtime(&sim_time_high);
sim_time = ((int64_t)sim_time_high << 32) + (int64_t)sim_time_low;
switch (pcap_data->state) {
case PLAY_IDLE:
int64_t check_time = sim_time - pcap_data->time_ref;
if ((check_time >= pcap_data->fi.tstamp) || pcap_data->fast_play) {
#ifdef DEBUG
printf("%lu: sending packet %d with tstamp %lu\n", sim_time,
pcap_data->pktcount, pcap_data->fi.tstamp);
#endif
pcap_data->state = PLAY_PREAMBLE1;
}
pcap_data->pktptr = 0;
pli_set_signal(pcap_data->eth_data, 0x07070707);
pli_set_signal(pcap_data->eth_cmd, 0xf);
break;
case PLAY_PREAMBLE1:
pli_set_signal(pcap_data->eth_data, 0x555555fb);
pli_set_signal(pcap_data->eth_cmd, 0x1);
pcap_data->state = PLAY_PREAMBLE2;
break;
case PLAY_PREAMBLE2:
pli_set_signal(pcap_data->eth_data, 0xd5555555);
pli_set_signal(pcap_data->eth_cmd, 0x0);
pcap_data->state = PLAY_SEND_PKT;
break;
case PLAY_SEND_PKT:
txd = pcap_data->fi.framedata[pcap_data->pktptr];
if (pcap_data->fi.bytecnt >= 4) {
txc = 0x0;
pcap_data->fi.bytecnt -= 4;
} else {
switch (pcap_data->fi.bytecnt) {
case 3:
txc = 0x8;
txd &= 0x00FFFFFF;
txd |= 0xfd000000;
break;
case 2:
txc = 0xC;
txd &= 0x0000FFFF;
txd |= 0x07fd0000;
break;
case 1:
txc = 0xE;
txd &= 0x000000FF;
txd |= 0x0707fd00;
break;
case 0:
txc = 0xF;
txd = 0x070707fd;
break;
}
pc_read_next_packet(pcap_data->pcap, &pcap_data->fi);
pcap_data->pktcount++;
pcap_data->state = PLAY_IDLE;
}
pli_set_signal(pcap_data->eth_data, txd);
pli_set_signal(pcap_data->eth_cmd, txc);
pcap_data->pktptr++;
break;
case PLAY_DONE:
pli_set_signal(pcap_data->eth_data, 0x07070707);
pli_set_signal(pcap_data->eth_cmd, 0xf);
break;
}
pcap_data->last_reset_n = reset_n;
acc_close();
return 0;
}
Here are the results of playing our PCAP file back into the Verilog simulation.
This screenshot shows the beginning of the first packet. Compare to the Wireshark screen shot above:

Click on Image for Full View
This screen shot shows the end of the first packet, some interpacket gap, and the 2nd packet beginning with the preamble. In this simulation, I have chosen to run in FAST PLAY mode, so the packets are sent back to back without using the timestamps in the PCAP file:

Click on Image for Full View
After getting the PCAP data into the simulation, it only seemed logical that we capture the simulation output to a PCAP file as well. Using very similar code this was quite trivial to implement. Here is a screenshot of Wireshark showing the simulation output:

Click on Image for Full View
This is a good example of how to integrate real world data, using a C application, into a Verilog simulation. If you are looking to implement something similar in your test environment, I would also recommend that you consider using wiretap instead of libpcap for your backend. Wiretap is distributed as part of the Wireshark package, and is a libpcap replacement. Wiretap has the advantage of being able to read many more PCAP file formats than libpcap, including PCAP files with nano second timestamps.
Unfortunately, most of the really good information on PLI is only available in hardcopy books. However, you can find some decent information if you search hard enough. The following links are a good starting point for designing with PLI:
Whether you want to admit it or not, there is a high probability that P2P protocols are present on your network. No matter how hard you try (or don’t try) to lock down access. Understanding these protocols and how they work is extremely important for network managers, device manufacturers, and service providers. Today, in part three of our series examining the stateful application protocols BreakingPoint simulates during testing, I wanted to look at Gnutella and BitTorrent. These are two of the most popular peer to peer (P2P) file sharing protocols available today.
BitTorrent™ is probably the most recognized name in P2P file sharing currently. According to the Digital Music News Research Group, BitTorrent accounts for 15% of all P2P traffic.
BitTorrent works in a way that is slightly different from other P2P applications. The software breaks up large files for transfer into many small pieces that may be downloaded from multiple peers simultaneously. This facilitates rapid file transfers as well as improving the ability of a downloading host to also act as a “seed” for downloads by other peers.
The increasing amount of bandwidth consumed by BitTorrent has resulted in many service providers utilizing deep packet inspection (DPI) technology to throttle BitTorrent bandwidth. However, BitTorrent also supports an encrypted mode of transfer which makes DPI classification much more difficult. Finally, BitTorrent has increased its use of the UDP transport which can sometimes make more efficient usage of available bandwidth than TCP.
Gnutella, while probably a less recognized name than BitTorrent, is estimated to have a P2P market share of roughly 35%, according to the Digital Music News Research Group. The reason it owns such a large
percentage is that Gnutella is the actual file sharing protocol used by applications compatible with the Gnutella network. iMesh, Limewire, Morpheus, and Shareaza are well known applications that have all had, at some point, support for the Gnutella network.
Gnutella starts by finding a set of peers, either in cached addresses or discovered from those cached addresses that do work. Once connected, the peers can share searching information, as well as actually perform the file transfers. Most of the data transfer itself is then transmitted over HTTP.
Knowing what traffic is on your network, and being handled by network equipment, is important. But you also must have an understanding of how those protocols actually work in order to realistically test devices. Above are quick descriptions of the intricacies of both BitTorrent and Gnutella, and it is important that network equipment can recognize and handle these unique attributes. That is why BreakingPoint simulates the protocols statefully so that you can test under real-world conditions. I've written up more details on testing with both BitTorrent and Gnutella protocols.
The life of a tester has changed dramatically over the past few years due to enhancements in content-aware network equipment, deep-packet inspection (DPI), more and more applications across the network and more dangerous security threats. Over the past few years testing professionals have come across a variety of new tools, including those from BreakingPoint, and now there seems to be a renewed focus and excitement around testing network equipment and servers.
Steve Mitchell, a performance tester at F5 Networks, a leader in Application Delivery Networking (ADN), focused on ensuring the secure, reliable, and fast delivery of applications. Steve is also the person behind PerfTesting.org has been working in the computing industry for the last 15 years and has always been driven by being able to measure and improve the performance of all of the products he has worked on; passion is a necessary trait for a successful tester.
Recently I spoke with Steve to discuss this evolution for performance testers and what it means for us all.
What is it about performance testing that attracts people to the role?
I have worked in the computing industry for the last 15 years, and in all of my experience, I have always been driven by being able to measure and improve performance of all of the products I've worked on. Not until F5 Networks did I make the transition to a full-time performance position about 5 years ago in our product development group. This was necessitated by the types of products that F5 delivers, and the need to have a dedicated group of people focused on industry-standardized and highly repeatable performance testing. I originally started in the group as a manager with one employee, and did a lot of testing myself. Now we're far larger, yet I still stay in touch with all of the performance solutions in the market.
This profession continues to drive me because of the ever changing atmosphere - I always look for opportunities in my career that are constantly interesting and changing, and performance testing and development definitely fit that. There's never a dull time with the speeds at which processors and technology has been leapfrogging - always a new chipset or idea that a developer has to make things faster, better, and more stable.
I think that whole mentality is why so many people, and good people at that, are attracted to performance testing in general. It's similar to tuning a car just to get 1 more horsepower - the physical and tangible pieces that you can touch and influence to eek out a bit more performance than your competitor to stay ahead in the race. I think also the types of challenges in increasing performance, or the development and test challenges in figuring out performance problems is also highly attractive to many.
Testers constantly use methodologies, but do you have a personal methodology or "core values" for how you test equipment? Your own personal rules for testing?
Yes, the most important thing being a scientific, step-by-step process to capture as much as possible. I'm not plodding or stuck in a rut -just fanatic about designing automation and scripts to capture as much as possible, and make things as absolutely repeatable as possible. I also have always had a knack for troubleshooting and finding problems, which lends itself well to investigating performance issues - thinking outside the normal box of issues that contribute to problems.
When you look back at the beginning of your career and today what are the major differences in what you do day to day? What are the major differences in how you get your job done?
The biggest difference is the maturity of the tools and automation available. Originally when I started working with performance equipment, it was all home grown - no real vendors to speak of. You were on your own for everything, which usually meant you built just enough to achieve what you were trying to test, and that was pretty minimal! It's progressed through horribly gross and unstable software from just about every vendor, with no thought to automation and features other than the basic requirements, to where we are today - lots of vendors to choose from, lots of options for features and functionality, and growing automation and control architectures. It's definitely come a long way in a pretty short amount of time compared to many other markets.
Today's economic environment obviously posses challenges for everyone, including performance and security testers in R&D/QA labs. Are their ways for perf testers to reduce testing costs without cutting corners?
I think there are a couple of ways that costs can be reduced without cutting corners. The first and most important is deeper automation - not just of tests, but of equipment and labs, and of reporting and delivery of results to not only testers, but management and otherwise. There are a lot of companies that have pseudo-solutions here, but nothing that really fits the bill for most of us. However, the good news is that most of this can be done with open source tools and standard operating system tools. We've done some of this at F5, and have seen a huge amount of work removed from individual testers, which means they can spend more time testing, which means products get better coverage, and faster to market.
The other thing that I think could help would be to adapt the automation above to utilize the very expensive performance test equipment more effectively, thereby getting more overall use out of it. Examples would include splitting up chassis into smaller parts so that more groups of tests can be run, dicing up ports in different amounts to optimize tests better so that there is less time per test.
All of these ideas have to do with optimizing the amount of time things are taking through automation or allocation.
Automation was an important topic on your forum site and now your blog. How has automation changed in performance testing over the years and where would you like to see it go in the near future?
As I alluded to above, automation was non-existent in the early years. It's come a long way, but there's still a lot left to do. Nowadays most vendors offer some sort of scripting language or interface to their products, typically via TCL, that allows you to do just about everything that their normal product GUIs can do. Some even allow you to do things more intelligently via these interfaces so you aren't taking as much time to configure things if you're only varying one parameter - this is extremely important in order to get the most out of your equipment.
Where does it need to go now? An open and transparent standard for all of the different pieces is key to the next step in the evolution of performance test automation. There are several efforts out there, one of which I'm a key contributor (TesLA), but the bottom line is that there needs to be a standard for all of the devices in the sandbox of performance testing to play with. The lab automation solution needs to be able to configure a switch, L1 patch panel, and a device under test, and then pass the relevant information on to the traffic generator, fuzzer, and other pieces in the test. Once a test is running, everyone involved needs to output common statistics and information in a centralized way so that the reporting aspect, which is always different for everyone out there, can be optimized.
I would predict that in the next 2 years this area will become more important to all vendors and consumers in the performance test industry than any new feature, functionality, or faster card or appliance. Being able to utilize all of these pieces as effectively as possible is not only financially responsible given the economy, but a requirement for many, if not all, organizations that have complex applications and solutions to test.
Tags: layer 2-7 // ddos and botnet simulation // custom applications and attacks // performance testing // application servers // server load testing // unified threat management // security updates // cyber warfare // tutorial // deep packet inspection // ids ips // vpn gateways // test methodology // network traffic generation // unified computing // 10-40-100 gige // iptv // wireless // virus and spam filters // load balancers // application protocol fuzzing // resiliency testing // proxies // voip // anti-malware // routers and switches // network management tools // blog post // wan optimization // ipv4-ipv6 // firewalls // data center planning and consolidation // cloud computing and virtualization //