BreakingPoint Labs

Testing a Moving Target: Four Considerations for Ensuring the Performance and Security of Cloud Infrastructures

UPDATE: You may also be interested in our resiliency testing paper and our cloud computing testing section.

We took on the topic of cloud computing in my last post on testing in the cloud where we looked at the challenges vendors faced when conducting performance, security, and load testing for cloud-based environments. It’s no surprise that the difficulties scale right along with the environment. It became clear while talking with Julien Sobrier, QA engineer for Zscaler, a provider of multi-tenant SaaS security services. According to Sobrier “It is extremely difficult to replicate the behavior of a cloud in a lab: changing latency, packet loss, broken connections, with overlapping packets.” The list goes on and on.

The challenges of testing cloud-based environments go well beyond just the size and complexity of the environment.  The dynamic nature of cloud infrastructures means QA must effectively test for an ever-changing unknown: 

  • Unlimited web services and applications
  • Elastic demand
  • Morphing usage patterns
  • Dynamic resource allocation

…and again the list goes on.   The constantly evolving characteristics of this adaptive environment, and the users who access it, create an unlimited number of testing variables. It’s a bit like building a moving skyscraper on shifting sands. It can be done, you just need the right tools or a whole lot of time and money

When it comes to innovation, last generation performance, security and load testing products have lagged behind the hardware and software they were designed to test stifling the pace of delivering stable next-generation products and services. In my conversations with a number of cloud vendors, the same pattern appears to hold true.  Sobrier explains “Right now, we are using the same tools that appliance vendors are using:  Protos for fuzzing, regular HTTP performance tools (Autobench), etc., and custom tools to create a bigger variety of traffic.” 

In an attempt to emulate realistic conditions, cloud vendors like Zscaler and larger cloud vendors like Microsoft, Amazon and other must use legacy tools, some originally designed for traditional LAN-based environments, onto hundreds of servers to simulate load. The net result: an amalgamation of tools and workarounds that is costly, brittle and not ideally suited for the task at hand.

Testing Cloud Infrastructure: Four Important Factors to Consider
While the tools used to test cloud infrastructures are not unique, the scale of those issues is very unique because of the dynamic and shared characteristics of cloud infrastructures. In this real-time adaptive environment, four factors are paramount:  Elasticity, Scale, Realism and Security.

Elasticity
Renata Budko, VP of Products and Marketing of HyTrust, sums up the dynamic nature of the cloud: “Cloud infrastructure is very different from the traditional set-up where applications make exclusive use of the server resources. In the cloud, resources are pooled, access infrastructure is shared and resource allocation changes dynamically. The underlying infrastructure is flexible and changes often and every major revision of hardware or software can result in significant performance impact and, of course, needs to be tested. However, in the case of cloud, the process is so dynamic that discrete testing following major upgrades is simply not sufficient.  You need to understand how services behave when deployed together.”

Testing in such a dynamic environment must closely reflect the elastic nature of usage patterns:  conditions change frequently, demand is elastic, resources are shared, and more frequent releases require continuous testing.  There is little time for cumbersome test configuration and scripting. Extensive automation is a must-have to replicate a wide range of usage patterns. What’s more, dynamic resource allocation means applications cannot be tested in isolation from one another. In addition to high performance, highly scalable testing platform, vendors need more agile, automated, and easier to use testing products designed for a fast-paced, dynamic environment.

Realism
In today’s frenetic Web services/dynamic application/mashup world, it is impossible to emulate all of the different types of traffic that traverse the cloud, but vendors still need to emulate a broad mix of traffic. And, that means more realistic testing tools that support an ever-changing mix of applications, services, and incredibly high volume of sessions and high memory usage with sophisticated security attacks. Otherwise, you are left to run small tests with a limited mix of applications then extrapolating the results. Ultimately, you are making assumptions about how things might work with very few real data points. This is not sufficient to ensure the SLAs cloud vendors must deliver to business application users.

Gomez’ CTO Imad Mouline echoes the need for more realistic testing underscoring the need to create more realistic transactions with real-user monitoring and reporting. According to Mouline, “It is important to simulate load to the infrastructure that is coming in from different IP locations, different networks and from different places in the world.” 

Scalability
Mouline also talked a lot about the fact that we assume too much when we sign up for cloud computing services. One of the largest and most dangerous assumptions is stability in the face of peak demand. Prior to deployment vendors must offer assurances that services will perform reliably under a variety of load conditions.

Simulating that load is easier said than done, however. Again, Mr. Sorbier:  “Most tools I've worked with simulate one client and one server. We need to simulate thousands of clients and servers, with different IP addresses to be closer to reality.”  Imagine the number of servers and the millions in LoadRunner fees that it would take to run the tests needed to emulate the typical load these infrastructures see on a daily basis, much less under peak conditions. With the state of legacy testing tools, it would take a dedicated hydro-electric plant and a government bail-out. Many have suggested using the massive computational power of the cloud to simulate that load, but we have yet to see this live up to the performance needs of growing cloud vendors. 

Security
Possibly, the biggest challenge lies in the security arena, in part due to the historical practice of conducting security and performance testing in isolation. In traditional hardware/software testing scenarios, security and performance organizations are typically siloed. As security breaches become more frequent and the impacts more severe, and as network equipment vendors embed more and more security functionality into their core network products, this is changing. But change has not come fast enough for the cloud. In this more open and accessible environment, the stakes are higher.

Security attacks are not just dangerous; the protection against these strikes can have an immense effect on overall performance. Vendors must recognize the impact of security on application performance – specifically, web services—and test accordingly by emulating the real world where hackers are exploiting the cloud to spread viruses, malware, and attacking critical network infrastructure. 

In a recent presentation on Cloud Computing Security, Eva Chen, CEO of Trend Micro reported  “a new virus is created every 2 seconds”. Clearly, you have to have massive computing horsepower and a wide range of current security attacks to test for this in order to remain secure. That’s going to require a new type of testing product designed to evolve along with the security landscape. Ensuring effective protection for cloud networks will require constant vigilance to keep testing tools current. 

Advancing the State of Cloud Testing
There have been few advances in the last decade when it comes to cloud server and infrastructure testing. But, to live up to the vision of “truth in testing,” vendors need better options. New testing tools are emerging, but the unique obstacles presented by the dynamic, shared cloud infrastructure have set the bar almost impossibly high leading the vendors we spoke with to rely on home-grown in-house options. Clearly, these vendors are in need of new more scalable, flexible, realistic and cost-effective options. In the next post of our cloud testing series, I’ll look at how companies are trying to leverage cloud infrastructures for performance and load testing. 

3 comments
Tags: blog post // server load testing // anti-malware // cloud computing and virtualization //

Interesting and Informative Blog

Posted by Anubhav at 2009-04-30 06:26
I have been wondering on testing the Cloud since it's inception. I would suggest a company to build an Oyster lab around the cloud (say with 3-4 servers) and use that for testing the products in REAL scenarios.

I know cloud is much bigger than this, but some of the functional aspects might get uncovered in hee. For Performance and Security Testing, we actually need a simulated environment that can act as a Cloud.

Deployment and test automation become key in the cloud

Posted by Marc-Elian Begin at 2009-04-30 10:07
Here at SixSq (www.sixsq.com) we're working on a platform to allow software developers, testers and QA people to better test their application, in a dynamically deployed near-production like environment, . We're using the cloud as our back-end to manage dynamic resources to host the 'system under test' and its testsuites. The application is called SlipStream (if you don't mind me giving a bit of publicity).

So, to get back to your posting, we're actually using the cloud to test applications that are not necessarily targeting the cloud as their production environment. Having said that, if the cloud IS the target environment, then the context switch between the test environment and the production environment is minimal, and removes a potential cause for errors.

Here are two issues we see in using the cloud, which we didn't have before:
1- machine image construction: we can easily lose track of which machine is what, as it's easy to create and store new virtual machine images
2- resolution of resource names: we can't predict the ip address and hostname on cloud resources before boot-time.

The solution to these have a common theme: automation.

Most applications now a days are distributed or n-tier (in other words multi-machines, virtual or not). This means that the first challenge in using the cloud is to deploy the software (to be tested). And since we want to get the benefit of the dynamic, on-demand, nature of the cloud, we want to automate the deployment and make it singleton-less, such that we can deploy several instances of our system under test at the same time, without risk of cross-fertilisation. But since we're dealing with several machines, we need to provide a mechanism such that they find each other, and can be loaded and configured from the ground-up, without knowing upfront what the ip addresses and hostnames will be.

As for the machine images, we need to be able to capture the recipe for creating machine images, such that it can be human readable, machine executable and version controlled. We also want to identify interconnection points between machines - i.e. input/output requirements - such that they can be resolved and connected during deploying the system under test (e.g. a testsuite will need the endpoint of the service it must test, a load balancer will need to know what server it can route the requests to).

So... once you've addressed the deployment stumbling blocks, then you can start tackling the functional, end-to-end, stress, performance, etc issues that need to be understood before the software system is put in production. But thanks to automation, we've taken the advantages of the dynamic nature of the cloud without too many side effects. However, we now have a 'on-demand system under test', which means we've eliminated the test-bed bottle neck (e.g. sys admin availability, resource availability in our data centre / machine room) and can trigger automatically the execution of testsuites on our system, with always well known initial conditions. Another important advantage, from a process point-of-view, is the possibility to integrate this process in a continuous integration system, such that we can automate the entire software production chain, from 'code commit', all the way to 'system acceptance tested'.

I hope this helps.

We're interested in people wanting to road test SlipStream, to see if our solutions to the above issues are sound. It's available as an alpha version (we're soon to start the beta test phase). It's free (you only have to pay for the Amazon resources you use).

Link to SlipStream SaaS: https://slipstream.sixsq.com


New perspective needed to test a cloud

Posted by Ted Wolf at 2009-04-30 15:27
Complexity is the base attribute driving the cost of any testing effort and compute clouds are one, if not the most complex system confronting us. N-Dimensional just seems inadequate when quantifying the testing space formed by compute clouds. This complexity, driven by the number and characteristics of the services, consumers and supporting infrastructures simply can’t be quantified, much less qualified.

In the past we thought we could put our arms around the testing space. We felt like we could understand all the possible states and the resulting effect on our code path traversals. We would use boundary value analysis, equivalence class partitioning and cause-effect graphing to point us to the most useful blackbox test cases. We’d create a few stress tests targeting the well beaten path and maybe, if we had the resources, we would cobble together an operational profile or two. If we were lucky (and management was really fearful), we’d create some white box tests to exercise small amounts of particularly nasty code.

We can no longer use these analytical tools in the simplistic fashion of the past. The amount of state information defining a cloud at any particular moment is huge and number of possible combinations is, for all practical purposes, infinite.

So what can we do?

In my opinion testing compute clouds requires a more Monte Carlo’ish approach.
Use Musa’s operational profiling ideas to create layers of simulated activity and toss them into the cloud. Run enough of these simulations and you create the statistical information required to make reliable claims.

The profiles can be built to target cloud characteristics like elasticity, scalability and security. The characteristic of realism is handled by the use of operational profiles.


Videos

More >


Interact





LinkedIn

YouTube

Newsletter


Subscribe to BreakingPoint Labs blog by email:

Type in your email, hit submit and quickly verify your address.


Subscribe to our RSS feed