SQL Server and VMware: A Potentially Fatal Combination
In this experiment we used WMPlayer 2.0.2. Guest system and Host system: Vista HP. Very likely this problem does not affect the ESX server, but we are still waiting a confirmation from VMware that ESX
server fully respects Microsoft SQL Server Core I/O Requirements
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqliobasics.mspx
Virtual machines are becoming increasingly
popular. They are often used to create
‘safe sandboxes’ on a physical box.
Administrators are likely to put programs inside of these virtual
machines and occasionally SQL servers.
Unfortunately, putting SQL Server inside of a virtual machine is potentially
a scenario for disaster!
Let’s think back for a moment to the state of
databases 30 years ago. What was a critical property of all databases before
they had stored procedures, partitioned views, XML and all of the other bells
and whistles of today? From the
beginning, all databases (excluding “toy” databases like MySQL) included some
type of write-ahead logging mechanism to ensure data integrity. LDF log in MS SQL, Undo and redo logs in
Oracle… they insured that when a COMMIT was executed, the data was guaranteed
to be saved.
To implement it on Windows, SQL Server marks
its IO operations with the flags WRITETHRU and NOBUFFERING. When SQL Server reads, it utilizes its own
cache and reads the data without the use of the OS cache. When SQL Server writes, it asks the OS to
wait until the operation is complete.
Notice that SQL Server has everything under control and does not use the
OS cache at all. This is a critical
difference between SQL Server and other applications (Fig. 1).

Fig. 1
From the perspective of the Host system, what
happens when SQL server is running inside of the virtual machine? The Host system does not know the contents of
the ‘safe sandbox’. It has no knowledge
as to whether Windows, Linux, or another operating system is inside. This renders it completely unaware of the OS
flags inside of the virtual machine.
Hence, when the virtual machine “asks” the Host system to write data, it
is served in the same manner as are ordinary, non-SQL Server applications. The write is done asynchronously, using a
cache (again, Fig. 1).
This difference can have multiple effects, both
positive and negative. On the positive
side, Perfmon data reveals that SQL Server under VMWare often works even faster
(Fig. 2)!

Fig. 2
Given that Perfmon only updates
charts once per second, the IO system must be artificially slowed down to
observe the effect. As evidenced in the
chart, when writes were finished on the virtual machine (the point is
demarcated by the vertical green line), there were still writes remaining on
the physical box.
This leads to the negative effect. Notice that when COMMIT is executed on the
virtual machine, the data is not sent to the IO system. The data is simply flashed to the real box. Unfortunately, when SQL Server is run with
VMWare, an interruption such as a power failure can cause you to lose
transaction data reported as “committed”.
Much worse, in the event of such an interruption, the cache can change
the order of writes, leading to a corrupted database!
We set up an experiment to illustrate the
effect of a power failure on data integrity when SQL Server is set up with
VMWare. We created two instances of SQL
Server 2005 on a physical and virtual PC.
We called them ZOOSTATION and VZOO, respectively. Each server instance had VMTest database specially created for this test. The virtual server was linked to the physical
one using Linked servers. Both VMTest
databases contained the same data table: demotable
(81832 rows each) with identical data.
To test virtual machine database integrity
after a power failure, we did the following:
The same UPDATE statement was executed over the physical and virtual
database in the same distributed transaction.
This update appends “2_” to the beginning of every row in the demotable
table.

After
COMMIT,

The
external drive where data/logs were located was removed immediately to emulate
an OS crash. After the OS crash had
been emulated, we restarted both SQL servers. On the physical box everything
was fine and the data was updated as expected.