In this two part series I will study different ways to scale resources in data centers and how chosen model impacts costs.
In Virtualized Data Centers performance is one of the most common factors limiting scaling.
Compute layer has a Scale Out model for performance:
- If you need more compute power just add servers to a shared resource pool
- Balance load by live migrating VMs to new servers
- Keep using one management point
- No additional silos
- Nearly linear scaling
- Problem solved
Traditional storage solutions are using different model to scale performance, Scale Up instad of Scale Out:
- Solutions are based on HA storage controller pairs
- To a certain degree you can add disks to increase performance
- Until the storage controllers become bottleneck
- With spinning disks storage controller was rarely the bottleneck
- Usually limiting factor in scaling was number of disks
- IOPS / drive ~ 100-200
- With SSD drives the game has changed
- IOPS / drive is few magnitudes larger, easily 10000+ or more
- Usually limiting factor is storage controller
- Even most powerful storage controllers can be saturated with just 10-20 SSD drives
- Once you run out of controller power, you will have to replace controller with more powerful model to gain more performance
- Quite often this a forklift upgrade, take the old one out and plugin the new one
- Additional HW to buy
- Downtime while upgrading
- Data migration, takes time & man power
- All driving up costs
- Alternative for upgrading controller is to add more controllers
- But since with scale up model these are separate entities
- More management points -> complexity
- More storage silos -> inefficient, since you cannot move resources from one silo to another one
- Added complexity and inefficiency will drive up the costs related to storage
- But since with scale up model these are separate entities
- To a certain degree you can add disks to increase performance
There is a mismatch with compute and traditional storage scaling models
- Optimally both should use Scale Out model
Example sizing exercise with “traditional” Scale Up model:
- Collect some performance data from existing system in order to estimate starting point and growth rate
- Select over how many years solutions should be amortized
- Typically three to five years
- Based on collected data, plot required scaling over years
- You might also want to plot different scenarios at different growth rates
Example:
- In this case
- faster growth rate = red line, 40% annually
- normal growth rate= green line , 25% annually
- slower growth rate = blue line, 10% annually
- Estimated required performance after five years is 9,2 ( of imaginary performance / scale units)
Example: Scaling units
With traditional storage solutions scaling unit is quite large, typically 3-6 different storage controller models to choose from. In the example above, there are four controller models and each controller can accomodate five additional units of scale (0-5,5-10,10-15 and 15-20). The more powerfull controllers are also more expensive in terms of hardware, software and support costs. So the economic steps are quite large.
Example: Overestimated sizing
If you overestimate your growth, you are paying for performance which you have never used.
In this case your growth rate was slower than estimated (10% Y/Y), but the solution was sized for higher growth rate (25% Y/Y). You could have achieved your business goals by using cheaper storage controllers.
Example: Underestimated sizing
If you underestimate your growth, you will run out of scalability/performance while there is still significant value for the asset in your books.
In the example you estimated that you would end up with 9,2 after five years, but hit that number already shortly after three years.
In this situation with scale up model you will have to upgrade your storage controllers or buy more storage controllers.
If you choose to upgrade your controllers, with scale up model it means that you will take the old storage controllers out and install new storage controllers, do a “forklift upgrade”. You might get some “trade-in-credit” from the old storage controllers, but most likely the asset is valued in your books at higher value than the “trade-in-credit”. So it is kind of double loss, on top of the money you have to spend money on the new storage controllers, you are also loosing most, if not all, of the value associated with the current storage controllers.
Adding more storage controllers is another option, but with scale up model, it will be another storage silo, with separate management domain. You cannot easily move performance or capacity between silos, which usually leads to a situation where the resources are not available in the correct silo. Typically more silos is less efficient than less silos.
Example: Unused performance for the first couple of years
Even if you nailed your growth estimation, for the first couple of years you are paying for scalability/performance that hasn’t been used yet (5,0 -> 9,2). You could have used cheaper controllers for the first years, but chose not to as it would have led to “fork-lift” upgrade during solution life span. In other words in order to avoid hassles related to upgrades, you over-provisioned scalability/performance. This is quite typical way of buying / sizing tradiotional Scale Up storage solutions, overspend to avoid possible problems.
Conclusion:
Scale Up model is not efficient and involves risks related to uncertain prediction of future. Typically the longer the life span of a solution is, the harder it is to get your estimates right. If you get your estimates wrong, you will either under-provision or over-provision.
With under-provisioning there will be problems to solve, either by having to upgrade solution mid-lifespan (need more money) or having scaling/performance problems with solution (cannot generate services to business at predicted cost/performance level).
With over-provisioning, you are driving up unit costs related to services. At minimum you are decreasing the profit related to services, at worst in competitive situations you can loose bids since your unit cost is higher.
How to scale more efficiently? With Scale Out model, have a look at part 2 of this series, “Economic impact: Scale Up vs Scale Out, Part 2“,