Sql-server – Virtualized SQL Server: Why not


The IT department where I work is trying to move to 100% virtualized servers, with all the data stored on a SAN. They haven't done it yet, but the plan eventually calls for moving the existing physical SQL Server machines to virtual servers as well.

A few months ago I attended the Heroes Happen Here launch event, and in one of the SQL Server sessions the speaker mentioned in passing that this is not a good idea for production systems.

So I'm looking for a few things:

  1. What are the specific reasons why this is or is not a good idea? I need references, or don't bother responding. I could come up with a vague "I/O bound" response on my own via google.
  2. The HHH speaker recollection alone probably won't convince our IT department to change their minds. Can anyone point me directly to something more authoritative? And by "directly", I mean something more specific than just a vague Books OnLine comment. Please narrow it down a little.

Best Solution

I can say this from personal experience because I am dealing with this very problem as we speak. The place I am currently working as a contractor has this type of environment for their SQL Server development systems. I am trying to develop a fairly modest B.I. system on this environment and really struggling with the performance issues.

TLB misses and emulated I/O are very slow on a naive virtual machine. If your O/S has paravirtualisation support (which is still not a mature technology on Windows) you use paravirtualised I/O (essentially a device driver that hooks into an API in the VM). Recent versions of the Opteron have support for nested page tables, which removes the need to emulate the MMU in software (which is really slow).

Thus, applications that run over large data sets and do lots of I/O like (say) ETL processes trip over the achilles heel of virtualisation. If you have anything like a data warehouse system that might be hard on memory or Disk I/O you should consider something else. For a simple transactional application they are probably O.K.

Put in perspective the systems I am using are running on blades (an IBM server) on a SAN with 4x 2gbit F/C links. This is a mid-range SAN. The VM has 4GB of RAM IIRC and now two virtual CPUs. At its best (when the SAN is quiet) this is still only half of the speed of my XW9300, which has 5 SCSI disks (system, tempdb, logs, data, data) on 1 U320 bus and 4GB of RAM.

Your mileage may vary, but I'd recommend going with workstation systems like the one I described for developing anything I/O heavy in preference to virtual servers on a SAN. Unless your resource usage requirements are beyond this sort of kit (in which case they are well beyond a virtual server anyway) this is a much better solution. The hardware is not that expensive - certainly much cheaper than a SAN, blade chassis and VMWare licencing. SQL Server developer edition comes with V.S. Pro and above.

This also has the benefit that your development team is forced to deal with deployment right from the word go - you have to come up with an architecture that's easy to 'one-click' deploy. This is not as hard as it sounds. Redgate SQL Compare Pro is your friend here. Your developers also get a basic working knowledge of database administration.

A quick trip onto HP's website got me a list price of around $4,600 for an XW8600 (their current xeon-based model) with a quad-core xeon chip, 4GB of RAM and 1x146 and 4x73GB 15k SAS hard disks. Street price will probably be somewhat less. Compare this to the price for a SAN, blade chassis and VMware licensing and the cost of backup for that setup. For backup you can provide a network share with backup where people can drop compressed DB backup files as necessary.

EDIT: This whitepaper on AMD's web-site discusses some benchmarks on a VM. From the benchmarks in the back, heavy I/O and MMU workload really clobber VM performance. Their benchmark (to be taken with a grain of salt as it is a vendor supplied statistic) suggests a 3.5x speed penalty on an OLTP benchmark. While this is vendor supplied one should bear in mind:

  • It benchmarks naive virtualisation and compares it to a para-virtualised solution, not bare-metal performance.

  • An OLTP benchmark will have a more random-access I/O workload, and will spend more time waiting for disk seeks. A more sequential disk access pattern (characteristic of data warehouse queries) will have a higher penalty, and a memory-heavy operation (SSAS, for example, is a biblical memory hog) that has a large number of TLB misses will also incur additional penalties. This means that the slow-downs on this type of processing would probably be more pronounced than the OLTP benchmark penalty cited in the whitepaper.

What we have seen here is that TLB misses and I/O are very expensive on a VM. A good architecture with paravirtualised drivers and hardware support in the MMU will mitigate some or all of this. However, I believe that Windows Server 2003 does not support paravirtualisation at all, and I'm not sure what level of support is delivered in Windows 2008 server. It has certainly been my experience that a VM will radically slow down a server when working on an ETL process and SSAS cube builds compared to relatively modest spec bare-metal hardware.