SQL Server Cache Hits Ratio and SQL Server Performance

As you know, one of the most important indicators of SQL server
performance is a cache hits ratio. This is the percentage of pages that
were found in the buffer pool without having to incur a read from disk.
First off, let us use the Performance Monitor, or PerfMon, which can
help us measure various performance properties on almost any system
component. Start the PerfMon program by running it from the Control
Panel (Start-> Control Panel-> Administrative Tools->
Performance). By default, it will be monitoring your system using its
default counters (Pages/Sec, Average Disk Queue Length, and %Processor
Time).

However, we are not interested in monitoring those indicators. So,
we will be adding our own counter by right clicking on any area to the
right side and selecting 'Add Counters...' In the window that will
appear, select 'SQLServer:Buffer Manager' as your performance object.
Next, find an indicator called 'Buffer cache hits ratio'. Note, that
this counter is limited to the statistics of the data pages only. It
does not cover a cache hits ratio for the execution plans, for example.

Now press the Add button and then Close so that this counter will be
displayed in the PerfMon. Review the chart and look for the newly added
indicator for Buffer cache hit ratio using the legend. I can probably
guess what the value is in your system.
It is about 99.8%, correct? It's a good indication if the value is
almost about 100%. So we are satisfied, and we can close PerfMon
because everything seems good. But it can be just an illusion. There
are many misconceptions around Cache Hits Ratio.
The first one is pure mathematical. By definition:
Cache Hits Ratio = 100% - (Number of Physical reads) / (Number of logical reads)
As on almost any database, there are at least hundreds of logical
reads per one physical read, to which their Cache Hits Ratio will be
always superior to 99%. Even if you can tell the dramatic difference
between Cache Hits Ratio = 99% and 99.999% (1000 times), it is almost
invisible on the Perfmon Chart:

Hence, I recommend using another counter, which is called 'Cache Quality':
Cache Quality = (Number of logical reads) / (Number of Physical reads)
For OLTP (Online Transaction Processing) systems value of
- 10 is terrible,
- 100 is quite bad,
- 1000 might be acceptable
- 10 000 is good enough.
For the reporting/OLAP (Online Analytical Processing) systems, Cache Quality can be much lower than for OLTP systems, of course.
But let's face another sad truth. What is the number of
logical/physical reads in the formula for the Cache Hits Ratio? How is
it calculated? The answer is simple: it is the cumulative value of
reads since the last start of SQL server.
And that makes this counter almost useless. As a cumulative value
for the whole history of SQL server, it measures a mixture of all types
of activities on it, including a 'cold' start (with a low Cache
Quality), all 'nightly' processes, technical/maintenance jobs, like
table defragmentations. On systems with a daily OLTP activity and
nightly OLAP activity, it shows a mixture of both.
Finally, if your server was running for 100 days and today you have
installed an upgrade, you will not see any result in the next few days,
because the optimization you've made would simply sink in the old
statistics, collected during the past 100 days.
That is why it is important to find a momentary Cache Hits Ratio, or
Cache Quality, for a given period of time for a query (if there is no
other activity). To do it, we should use 2 counters. @@TOTAL_READ gives
us a cumulative number of physical reads. To get a number of logical
reads, we can use this query:
select * from master.dbo.sysperfinfo
where counter_name='Page lookups/sec'
Even if it is named 'pages/sec', don't worry; it actually contains
the cumulative number of logical reads. Now we are ready to write our
procedure:
create procedure CalcCacheHits
@sql varchar(8000)
as
declare @physical int, @logical int
select @physical=@@total_read,@logical=cntr_value
from master.dbo.sysperfinfo where counter_name='Page lookups/sec'
exec (@sql)
waitfor delay '00:00:10'
declare @physical2 int, @logical2 int
select @physical2=@@total_read,@logical2=cntr_value
from master.dbo.sysperfinfo where counter_name='Page lookups/sec'
select @physical2-@physical as PhysicalReads,
@logical2-@logical as LogicalReads,
case when @physical2-@physical = 0 then -1
else (@logical2-@logical)/(@physical2-@physical) end
as CacheQuality
GO
Note that there is a 'WAITFOR' after exec (@sql). We need it because
the system table sysperfinfo is not updated immediately, but rather in
regular intervals.
Its time we test our procedure. To ensure that our cache is empty, we can erase it using:
DBCC DROPCLEANBUFFERS
Now, we try our first select:
| PhysicalReads |
LogicalReads |
CacheQuality |
| 250 |
13162 |
52 |
As expected, SQL server is 'cold', data is not cached, and CacheQuality is low.
Now run the query again; this time the data is cached:
| PhysicalReads |
LogicalReads |
CacheQuality |
| 0 |
757 |
-1 |
As expected, the number of Physical reads is 0
CacheQuality = -1 is our protection vs. a division by zero (757 / 0).
During all these experiments, PerfMon always indicated a Cache hits Ratio of about 99.8%
|