Debugging RTL problems using the BreakingPoint Elite
by Jon StroudI am an RTL designer that found my home at BreakingPoint doing Hardware design. After the initial checkout of our 8 port, 1Gig blades for the BreakingPoint Elite chassis, I moved over to our software team to work on some of the lower layer code, such as our phy drivers, and really get the 1Gig board up and running enough for our network processor designers to start running some real traffic on these boards.
Having valuable experiance in FPGA design, board design, and now even software design, I thought I would share an RTL debug experience that may hopefully give your RTL designers some food for thought.
When debugging an RTL problem, there are 2 big steps in getting to a solution; recreate the problem in the lab, and ultimately recreating the problem in simulation. Once we can recreate the problem in a simulation, it almost always becomes a mere matter of implementation from that point. We can make the changes in the Verilog, rerun simulations to verify the fix, rerun regression simulations, and finally head back to synthesis.
Recently, one of our FPGA designers was tracking down an issue which we were able to catch in the PCAP files, but we were having trouble triggering on the error with a Logic Analyzer. So, the thought occurred to me: what if we were able to take the traffic we captured and play it back directly into the FPGA simulation?
In the following example I will show how we were able to do just that.
Lets say you are in the lab running a "Session Sender" test with "Stack Scrambler" enabled, and you find that you are dropping sessions. You can export a PCAP from BreakingPoint Elite to see the traffic generated:
Click on Image for Full View
This test generated more then 700MBytes of Egress traffic. I am going to skip over the first 1000 frames, and export the next 200 frames. I want to grab the Transmit data only, because at this point, I am only interested in seeing the traffic that was sent into the DUT (read more on traffic filtering):
Click on Image for Full View
Here is a quick look at the traffic in Wireshark:
Click on Image for Full View
Now we can see the traffic that was generated by the Session Sender and Stack Scrambler. This is great, but what I really wanted to do was inject this traffic directly into an FPGA simulation. There are many different ways to accomplish this:
- I could write Verilog models that were capable of reading a PCAP file directly.
- I could write some C application that could read the PCAP file and translate it into a vector file more easily read by Verilog.
- I could use the Verilog Programming Language Interface (PLI) to execute some C code for reading the PCAP.
The task of reading a PCAP file seemed like something that would be much more easily done in a software language like C, especially considering I would have access to libpcap which already has everything I need to read PCAP files. Although I had never used PLI before, I did a little web research and found a few examples to get me started in the right direction. If you are a SystemVerilog user, I would recommend you also look into the Direct Programming Interface (DPI).
If you have never seen or used PLI before, it essentially provides a method of calling C functions from a Verilog simulation, and allows passing signals into and out of the C code. The signal will get translated to an integer or real value in the C code. You will typically have a C function that you call from the Verilog, which will initialize all of your data in the C environment. In that initialization function, you can configure other functions to be called on signal events. In my case, I used a Verilog module to generate my XGMII clocks, and resets. I call my PLI initialization function and pass it my XGMII clock, reset signal, XGMII data bus, and XGMII command bus. The C code sets up pointers to each of these signals and configures a function to call on each transition of my XGMII clock.
You may note that I also pass in a boolean value that tells me whether play back is fast mode, or real time mode. This allows me to skip all of the timestamps in the PCAP file and play the packets back to back as quickly as possible, or to use the timestamps to help recreate the actual timing of the packets.
Here is an example of my PLI initialization code:
struct pli_play_data {
wtap *pcap;
char *tfinst_p;
char filename[512];
frame_info_t fi;
int64_t time_ref;
uint32_t pktptr;
uint32_t pktcount;
uint8_t fast_play;
uint8_t ipg_bits;
eth_play_states state;
handle eth_clk;
handle eth_data;
handle eth_cmd;
handle reset_n;
uint32_t last_reset_n;
handle pcap_done;
};
static PLI_INT32 pli_pcap_playback() {
struct pli_play_data *pcap_data;
acc_initialize();
//allocate memory for this instance of playback
//we may be playing multiple PCAPs back in the same
//simulation, so we need to be sure the data is kept
//seperated.
pcap_data = malloc(sizeof(struct pli_play_data));
//Get handles to the Verilog signals
pcap_data->eth_clk = acc_handle_tfarg(PLI_CLK); // XGMII clock
pcap_data->eth_cmd = acc_handle_tfarg(PLI_COMMAND); // XGMII Command
pcap_data->eth_data = acc_handle_tfarg(PLI_DATA); // XGMII data
pcap_data->reset_n = acc_handle_tfarg(PLI_RST_N); // Reset
//store the previous value of reset so we can detect rising or falling edges
pcap_data->last_reset_n = pli_get_uint32(pcap_data->reset_n);
// this is a status signal driven from the C code to tell Verilog that
// playback is finished.
pcap_data->pcap_done = acc_handle_tfarg(PLI_PCAP_DONE);
pli_set_signal(pcap_data->pcap_done, 0); // set signal low
// Boolean value to tell us whether to play the PCAP back at full speed or
// to obey the timestamps in the PCAP file.
pcap_data->fast_play = pli_get_uint32(acc_handle_tfarg(PLI_FAST_PLAY));
if (pcap_data->fast_play)
printf("Using fast play\n");
else
printf("Play at normal speed\n");
// Set the initial state of the Playback State machine. This is a
// state machine in the PCAP playback C code.
pcap_data->state = PLAY_IDLE;
// Get the PCAP filename and open it.
strncpy(pcap_data->filename, tf_strgetp(PLI_FILENAME, 'b'), 512);
pcap_data->pcap = pc_pcap_open(pcap_data->filename);
// call the function "pli_play_data" on every edge of the XGMII clock signal
acc_vcl_add(pcap_data->eth_clk, pli_play_data, (PLI_BYTE8 *)pcap_data, vcl_verilog_logic);
acc_close();
return 0;
}
Many of the examples I found on the web used global variables for data storage. This does not work if you plan to playback multiple PCAP files in the same simulation. So, instead I reallocate memory for all of my pcap_data structure each time I call the initializer. Then I pass that data to my worker function in the line:
acc_vcl_add(pcap_data->eth_clk, pli_play_data, (PLI_BYTE8 *)pcap_data, vcl_verilog_logic);
Here is an example of the Verilog code to call this function:
reg pcap_cap_done;
parameter FAST_PLAY = 1; // play back capture file as fast as possible
parameter REAL_PLAY = 0; // play back capture file using timestamps in pcap file
$pli_pcap_playback ("export.pcap", // input playback pcap filename
pcap_sim_TB.XGMII_1_RX_CLK, // input Ethernet Transmit clock
pcap_sim_TB.XGMII_1_RXC, // output Ethernet Command [3:0]
pcap_sim_TB.XGMII_1_RXD, // output Ethernet Data [31:0]
pcap_sim_TB.pcap_play_done, // output indicates pcap playback completed
pcap_sim_TB.RST_N, // input active low reset
REAL_PLAY // input fast play boolean
);
This gets us all set up to do the playback. The real heavy lifting is done inside of the function "pli_play_data". Here is an abbreviated version of that file (some details removed for space):
static PLI_INT32 pli_play_data(p_vc_record vc_record) {
int32_t sim_time_low;
int32_t sim_time_high;
int64_t sim_time;
uint32_t txc, txd;
struct pli_play_data *pcap_data;
acc_initialize();
pcap_data = (struct pli_play_data*)vc_record->user_data;
sim_time_low = tf_getlongtime(∼_time_high);
sim_time = ((int64_t)sim_time_high << 32) + (int64_t)sim_time_low;
switch (pcap_data->state) {
case PLAY_IDLE:
int64_t check_time = sim_time - pcap_data->time_ref;
if ((check_time >= pcap_data->fi.tstamp) || pcap_data->fast_play) {
#ifdef DEBUG
printf("%lu: sending packet %d with tstamp %lu\n", sim_time,
pcap_data->pktcount, pcap_data->fi.tstamp);
#endif
pcap_data->state = PLAY_PREAMBLE1;
}
pcap_data->pktptr = 0;
pli_set_signal(pcap_data->eth_data, 0x07070707);
pli_set_signal(pcap_data->eth_cmd, 0xf);
break;
case PLAY_PREAMBLE1:
pli_set_signal(pcap_data->eth_data, 0x555555fb);
pli_set_signal(pcap_data->eth_cmd, 0x1);
pcap_data->state = PLAY_PREAMBLE2;
break;
case PLAY_PREAMBLE2:
pli_set_signal(pcap_data->eth_data, 0xd5555555);
pli_set_signal(pcap_data->eth_cmd, 0x0);
pcap_data->state = PLAY_SEND_PKT;
break;
case PLAY_SEND_PKT:
txd = pcap_data->fi.framedata[pcap_data->pktptr];
if (pcap_data->fi.bytecnt >= 4) {
txc = 0x0;
pcap_data->fi.bytecnt -= 4;
} else {
switch (pcap_data->fi.bytecnt) {
case 3:
txc = 0x8;
txd &= 0x00FFFFFF;
txd |= 0xfd000000;
break;
case 2:
txc = 0xC;
txd &= 0x0000FFFF;
txd |= 0x07fd0000;
break;
case 1:
txc = 0xE;
txd &= 0x000000FF;
txd |= 0x0707fd00;
break;
case 0:
txc = 0xF;
txd = 0x070707fd;
break;
}
pc_read_next_packet(pcap_data->pcap, &pcap_data->fi);
pcap_data->pktcount++;
pcap_data->state = PLAY_IDLE;
}
pli_set_signal(pcap_data->eth_data, txd);
pli_set_signal(pcap_data->eth_cmd, txc);
pcap_data->pktptr++;
break;
case PLAY_DONE:
pli_set_signal(pcap_data->eth_data, 0x07070707);
pli_set_signal(pcap_data->eth_cmd, 0xf);
break;
}
pcap_data->last_reset_n = reset_n;
acc_close();
return 0;
}
Here are the results of playing our PCAP file back into the Verilog simulation.
This screenshot shows the beginning of the first packet. Compare to the Wireshark screen shot above:
Click on Image for Full View
This screen shot shows the end of the first packet, some interpacket gap, and the 2nd packet beginning with the preamble. In this simulation, I have chosen to run in FAST PLAY mode, so the packets are sent back to back without using the timestamps in the PCAP file:
Click on Image for Full View
After getting the PCAP data into the simulation, it only seemed logical that we capture the simulation output to a PCAP file as well. Using very similar code this was quite trivial to implement. Here is a screenshot of Wireshark showing the simulation output:
Click on Image for Full View
This is a good example of how to integrate real world data, using a C application, into a Verilog simulation. If you are looking to implement something similar in your test environment, I would also recommend that you consider using wiretap instead of libpcap for your backend. Wiretap is distributed as part of the Wireshark package, and is a libpcap replacement. Wiretap has the advantage of being able to read many more PCAP file formats than libpcap, including PCAP files with nano second timestamps.
Unfortunately, most of the really good information on PLI is only available in hardcopy books. However, you can find some decent information if you search hard enough. The following links are a good starting point for designing with PLI:
- Asic-World PLI Information There are some PLI examples here, but this site is most useful for its instructions on compiling PLI code into different simulators.
- Aldec VHPI Tutorial This is a tutorial for using the VHDL PLI interface called VHPI.
Click on Image for Full View
blog comments powered by Disqus
