What is performance testing?

From time to time, someone asks the question: "Why performance test?" I've heard it several times over the 30 years I've been involved at some level with application performance assessment. Throughout those years the applications and customer expectation have evolved and in most cases have become more challenging.

With the advent of Google and its amazingly fast search engine, end users of applications have come to expect an almost Google-like level of performance, regardless of the complexity of the application and the business cases. Hence, fast performance and elastic scalability have become the gold standard for customers.

In the competitive industry of web applications, if your solution cannot solve a problem fast enough then that application is likely doomed to fail. The advent of big data and streaming data requires businesses to quickly understand the changes in data and make real-time decisions – which can be the difference to winning or losing a big sale to competitors.

Frequent performance testing in a development cycle ensures that the product ends up being as responsive as possible, has the ability to scale and has 24 /7/365 reliability. As well, proper sizing will ensure that the customers have the solution that will reliably satisfy their needs.

What is performance testing?

A broad spectrum of testing is associated to the term "Performance testing." Often a popular oversimplification of the area of performance testing is a set of tests that measures the time for users' gestures (transactions) for single user or very light loads. But that type of testing only covers a small sliver of what performance testing entails.

Another misconception is that performance testing is an activity to schedule late in the development cycle, hence not much thought or planning around performance is baked into the beginning of an initiative. The late planning of performance testing often results in subpar performance, diminished scaling characteristics and late gating defects. Resolving performance defects late in the development cycle is very challenging and often requires an architectural and/or technology change.

All aspects of performance need to be considered and baked into the overall development release cycle in the planning phase. Along with what the application will do, we should be considering how we want the application to scale, perform and level of robustness is essential? Do we want the ability to deploy the application on different nodes so we can add nodes when more throughput is required? How fast do we need the results? Are there general expectations or service level agreements (SLAs) that need to be met? Are there uptimes that are expected from the application? All these factors should be considered as early in the design and architecture of the application as possible to ensure the technologies and designs chosen will align with those expectations.

Often, with subtle changes in a performance test such as the duration or the user loads, testing methodologies one can get different results to satisfy different questions. Below is a non-exhaustive list of some of these performance test variations that can fall under the umbrella of performance testing:

Single user performance
Unit level performance
User scale
Bull rush
Spike
Stability/endurance or soak
Data scale or volume
Vertical or scaling up of resource
Horizontal or scaling out resources
Measure time for certain system activity to complete (e.g. starting up applications)

Let's delve into each of these disciplines and expand on what are their purpose and how they can be executed.

Types of Performance Tests

Unit level performance tests

A discrete unit of work done at a low component or even code level that measures elapsed time of basic code execution. This type of testing is often used by developers to rapidly test changes in their code to ensure performance degradations are not introduced.

The power of these types of tests includes:

Rapid execute time
Almost immediate feedback to the developer
Typically minimum components required for test environment (mocked out environment)
Fits well into a continuous code delivery system
Can be run very frequently
Relatively easy to create and maintain

These type of tests are often written by developers in the language of the source code and maintained with the source code. The tests are reasonably quick to create hence one can develop a prolific amount of tests to ensure solid code coverage.

Single user performance testing

One of the more commonly known performance tests are single user or a light user load testing. This form of testing executes a series of user actions run multiple iterations to allow for better statistical values.

Commonly the key metrics tracked are Mean, Median, 90th percentile and Maximum values. Those results are either compared to a baseline set of values or a predefined set of values that are deemed acceptable. Single user tests are usually relatively quick to execute and are excellent in finding software bottlenecks or slow code execution not due to limitation of system resources.

Though this type of testing doesn't overly tax a system hence resources are not the throttling points, resource metrics and various application and system logs should also be monitored for confirmation. The automation of these tests can use a homegrown test tool, a freeware performance test tool (e.g. Jmeter) or a licensed performance test tool (e.g. Loadrunner).

Each type of test tool has its pros and cons:

Homegrown tests often can be the best fine-tuned but require regular maintenance of both the tests and the test tool.
The free tool has no cost to use but may not have all the functionality or capabilities required, so it may need augmentation or understanding of the tool's limitations.
The license tools have much better support of the tool and its usage, but usually comes at a fairly hefty price tag.

A proper cost/benefit analysis should be done to determine which test tool strategy one should use.

User scale testing

It is important to understand how your application/software performs as more users are added to a finite system. A good example is if you want to size an environment for a proposed number of end users, you may want to performance a user scale test to see how the performance changes as users are added to the system. A good result would be a linear profile, that would make extrapolation and interpolation easier to estimate.

One can employed a series of individual tests increasing the number of users proportionally per test, e.g. 0.5 user per core, 1 per core, 2 per core, etc.

Another method might be to run a number of users allow a period time to elapse or attain steady state then to ramp up more users in stages. This is often referred to staged or step up performance tests.

At lighter loads this type of testing can be used to quickly identify slow code execution. But as you scale users, resource limitation can readily become apparent (e.g. locks, threads, memory, etc.). This will allow development to determine whether their application makes maximum use of the finite resources or not. Take to extreme loads this type of testing can identify upper limits of an application, what happens at those limits and can be used to develop guardrails or where governors maybe required.

Therefore when executing user scale tests system metrics, application logs and transaction response metrics should be carefully monitored and correlated to best understand the scaling characteristics of the application and system.

Examples of types of tools that can be used are: Loadrunner, NeoLoad, Jmeter, Rational Performance Tester, K6.

Vertical scale or scaling up resources

An essential attribute of an enterprise application is the ability to vertically scale or as you add more resources to the server, the application's performance improves relatively proportionally to the amount of resources added.

An example is if you double the number of CPUs and RAM user transactions would improve in performance possibly up to twofold or if you double the resources and double the number of interactive users the performance would remain relatively unchanged. An easy way to test this is to run a set of tests with different user loads before and after you add the resources.

Example: An application that is strongly and positively impacted by the number of cores.

Configuration 1: 10 core machine with ample RAM

Test1a: 10 concurrent users, Test2a: 20 concurrent users and Test3a: 30 concurrent users

Configuration 2: 20 core machine with ample RAM

Test1b: 20 concurrent users, Test2b: 40 concurrent users and Test3b: 60 concurrent users

Compare Tests 1a to 1b; 2a to 2b & 3a to 3b. If the transaction times are statistically equivalent then the application demonstrated vertical scalability characteristics.

The benefits of an application that scales vertically are the following:

One can forecast what additional resources are required to support different user loads.
If multiuser performance is below par, adding additional resources may bring it to par.
One might be able to use elastic scaling of resources to handle peak loads then reduce resources when not required.

If adding resources does not improve performance, there likely is a software bottleneck that needs to be resolved.

Examples of types of tools that can be used are: Loadrunner, NeoLoad, Jmeter, Rational Performance Tester, K6.

Horizontal scale or scaling out resources

Many applications have parts of the application or the entire application that can be distributed on separate servers. The ability to distribute an application allows companies to take advantage of smaller more inexpensive hardware to scale up the application, users, data, etc. To ensure an application can take a performance advantage of additional hardware one can plan and execute a scaling out test which will help determine how much adding h/w impacts performance or scale.

Example: Doubling the number of like servers in an application – assuming the entire application can be scaled out.

Configuration 1: One 10 core machine with ample RAM

Test1a: 10 concurrent users, Test2a: 20 concurrent users and Test3a: 30 concurrent users

Configuration 2: Tow 10 core machines with ample RAM

Test1b: 10 concurrent users, Test2b: 20 concurrent users, Test3b: 40 concurrent users and Test4b: 60 concurrent users.

One now can compare Test1a to both Test1b and Test2b, similarly Test2a to both Test2b and Test3b and Test3a to Test4b.

For perfect horizontal scale doubling h/w and holding use loads constant would halve the response times and keeping the user proportional to the h/w would result in a constant response times.

This form of performance testing should be executed on applications that do promote scaling out on hardware to ensure the customers will see significant advantages as they add h/w and can properly estimate h/w and plan topologies as they grow a user base.

Examples of types of tools that can be used are: Loadrunner, NeoLoad, Jmeter, Rational Performance Tester, K6.

Data volume or data scale testing

In today's competitive market current information and just in time data is becoming increasingly paramount, meaning the ability for an application to grow data but remain fast and responsive is more important than ever.

Hence, testing data scale is done on a regular basis to assist development in improving their application and customers in planning their companies growth. As well, data scale metrics are useful in properly estimating hardware requirements.

The simplest method of testing data scale is to keep data shape, data attributes and test artifacts constant and simply scale up the data (e.g. number of records). Next, run a constant user load against the test artifacts at different data sizes, developing an understanding of how the increase in data size impacts performance. This method reduces the number of variables to basically one variable (data size).

A more challenging test may be to add a new table or different data types, in this case hold the data size and shape in all original tables constant and add one table at a time, then test. After that, one can start to scale the new table or data types and measure the performance and resource impacts. In that way we have isolated the impact of adding a new table and measured the impact of scaling that table or all the tables.

Examples of types of data synthesis tools that can be used are: Hazy, Tonic, Sogeti.

Bull rush testing

If you have a situation where many users log in to a critical application and start working all within a fairly short time frame, it may have an impact on the users' perspective of performance.

A bull rush test is where many concurrent users simulate executing a set series of actions to measure the impact on the system. This testing can quickly uncover software and resource bottlenecks or any type of limitations that can have a negative impact on users' perceived performance.

Examples of types of tools that can be used are: Loadrunner, NeoLoad, Jmeter, Rational Performance Tester, K6.

Spike testing

Similar to a bull rush test, except spike tests typically have a series of sudden increases in load to measure impacts and possibly the upper boundaries of an application.

Examples of types of tools that can be used are: Loadrunner, NeoLoad, Jmeter, Rational Performance Tester, K6.

Stability testing

Stability testing is a longer-running test with the primary objective to determine how an application responds to stimulus over a period of time where steady state or a consistent pattern of activity is established. Preferably the tests are comprised of a set of activities that closely simulate customer usage patterns and/or activities that stress various key areas of an application. Monitoring errors and warnings, system and application resources, response times, etc. will determine whether there are runtime resource leaks, loss of performance or loss of integrity over time.

The duration of a stability test is dependent upon the application, cycle of activity and the acceptable duration of the application before a restart is required. In a real world environment an application may be required to be up 24/7/365 or possibly a restart is acceptable after a period of time, such as one week. One needs to understand those expectations and derive a stability duration that can achieve the goal of finding runtime issues preemptively.

An example of how to simulate weeks of real world activity in hours or days of testing is to use zero think times or concurrent users, simulate administrative activity such as data refreshes hourly instead of weekly, monthly or other period – thus executing millions of activities in hours instead of weeks or months in the real world.

Stability test beds are excellent test environments to try new functionality, code changes, third party upgrades, new security measure, etc. prior to releasing these to customers, as you can measure their impact over many iterations to confirm their reliability. At the end of each stability run one can use this time to try failover or chaos activities to determine their impact prior to restarting stability testing. Failover or chaos testing at the end of thousands or millions of actions is a great test of the overall resiliency of the application.

Examples of types of tools that can be used are: Loadrunner, NeoLoad, Jmeter, Rational Performance Tester, K6.

Failover testing

Though not always associated with performance testing, it often falls to the performance team to execute this type of testing, as they usually have the automated tests that simulate users' actions. A failover test is when an application is under some level of steady state load when a failure, restart or crash is simulated and the time for the application to resume normal operations is measured.

Summary

Performance testing is a broad discipline of testing methodologies and is absolutely essential to the health and success of enterprise software. Some or all of the various performance testing methodologies should be run on your software application to ensure that the customer is provided a fast, reliable and resilient product.

Trusted by the worlds most admired brands

Master uncertainty.
Tame complexity.

From a winning beginning to long-term success

The latest resources on supply chains

What is performance testing?

What is performance testing?

Types of Performance Tests

Unit level performance tests

Single user performance testing

User scale testing

Vertical scale or scaling up resources

Horizontal scale or scaling out resources

Data volume or data scale testing

Bull rush testing

Spike testing

Stability testing

Failover testing

Summary

More from Mark McFadden

Planning a performance test

Trusted by the worlds most admired brands

Master uncertainty. Tame complexity.

From a winning beginning to long-term success

The latest resources on supply chains

What is performance testing?

Types of Performance Tests

Unit level performance tests

Single user performance testing

User scale testing

Vertical scale or scaling up resources

Horizontal scale or scaling out resources

Data volume or data scale testing

Bull rush testing

Spike testing

Stability testing

Failover testing

Summary

More from Mark McFadden

Planning a performance test

Master uncertainty.
Tame complexity.