Finding IT System Bottlenecks and Predicting Performance

The Opportunity

The company processes large amounts of documents and other data. The processed data is needed in a very timely fashion for human interpretation and analysis, in order to serve their clients.
The system is comprised of servers with varying processing power, input/output channels, database technologies, and a variety of software applications, each its own task along the processing chain.
With so many nodes along the chain, with a myriad of configuration factors at play, the company wanted to know where bottlenecks existed, and what could be done to tune the system for speedier processing.

At the heart of the analysis is regression modeling, which allows us to quantify the effect of each system configuration parameter in the process. But because many parameters are fixed, and never vary, we needed to artificially create the variation required for our modeling.
Design of experiments is a formal statistical methodology that optimally provides a number of “runs”, each with varying values of the parameter, to most efficiently quantify the effects. This allowed us to understand these parameter factors that wouldn’t otherwise be present in server log data.

The study found some things that were obvious, but others not. For example, the size of files processed is unsurprisingly a factor. However, we found breakpoints where, above a certain file size, performance significantly degraded in a non-linear fashion.
Other factors included a process change at a certain date, as well as a software upgrade from one vendor that actually decreased performance.
Finally, for certain clients, based on their specific requirements, it was found that processing for their projects weighed heavily on the system. Knowing when these projects would be processed allowed IT to plan for capacity demands.