Spam Fight! Emulating Spambots with Markov Text Generation To Validate Email Filtering Accuracy
by Andrew JeanBy Andrew Jean
The BreakingPoint Storm CTM is now a spambot. Yes, you read that correctly. The BreakingPoint Storm CTM is now capable of producing mass quantities of that wonderful spam email gobbledygook that we're all so fond of.
Why, you ask? It's all about realistic simulation.
In 2010, you'd be hard-pressed to find a computer user who hasn't experienced annoyance at the hands of spammers. And although there are numerous groups actively working on email filtering solutions and Web traffic analyzers, eliminating spam is still a very difficult problem that only becomes more difficult as spammers develop more sophisticated tools. Some spammers have even been so clever as to fill their emails with the rearranged contents of popular novels and publications, making spam even more difficult to detect, even for a human reader.
For a long time now, BreakingPoint has provided numerous tools for our customers for simulating webmail protocols, POP3, and SMTP, but never have we been better at actually producing realistic spam. In my efforts to improve the realism of our various email protocols here, I've developed a fairly simple Markov text generator, which generates text more intelligently than by pure randomness. In essence, it learns how to arrange words based on text that it reads; if one were to feed it Charles Dickens, it wouldn't sound just like Oliver Twist, but it would sound a great deal like Charles Dickens with a severe head trauma. It goes far beyond stringing random characters together into nonsense that is obviously not produced by a human.
For comparison, here's a 100-word sample of spam text that we generated the old way:
sopping Marginally interval Beverage tricky truth folk mountainous corporate quarterfinal banana mortuary Eventually chronically; teargas Wearily Sgt. time zone diminish! Paraffin Saw beginner southwestern serial killer machinery! diehard nervous empowerment eviction we Proven. Trajectory ordain driven small talk manage pamphlet; Inventory unbeaten navigator Sternly chatterbox insubstantial symbolize misbehave Oriental sociable stained glass phony. opener tardiness spy decide! ally uninsured visual fought matted sluice fluke quadruped Inuit across trace surprising chipper cramped? childlike bison blusher enlighten consecrate putt vegetable all Godchild circumstantial, justly, pointed brunette, fortieth humbly primary school hilariously careless overwrought? ascribe reprisal involvement epithet transpire scrabble sponsorship aloud. aquatic
And here's 102 words from the new Markov generator:
And one had to admit that I didn't relish the idea of bringing pals back in the small hours now, but I must say that to me there seems to be something positively fiendish in a man who acted from the best motives, but your ladyship, knowing him better, he had a kind of paternal muscular spasm about the mouth, which is capable of being developed. Life became like what the poet Johnnie says -- one grand, sweet song. She made me feel as if I were a memorizing freak at the halls. I hadn't been expecting her for days. Payable in advance?
(You'll pardon me if I don't give too many details on the implementation of the text generator. Wouldn't want to give any bright ideas to the spammers.)
Why is this significant? This is an important step towards helping manufacturers of email and web traffic filtering solutions produce better products. By validating their products with the BreakingPoint Storm CTM during development, their solutions will be better prepared to deal with the onslaught of pseudo-realistic spam that constantly floods inboxes worldwide. The closer the BreakingPoint Storm CTM can come to walking, talking, and acting like a spambot, the better it will mimic real-world spam, giving equipment and software manufacturers a more accurate baseline from which to work.
We'll keep working on more and better ways to make our product even more realistic so that our customers can deliver products that are even more useful. Meanwhile, here's to a spam-free inbox. Cheers, from BreakingPoint.

