Page Life Expectancy

Hard disks are slow; memory is fast. This is a fact of nature for anyone that works with computers. When we’re talking about performance and memory we’re talking about a problem how to write data from slow storage into fast memory. Once loaded, your data can perform very fast and only need to go back to disk when new data is needed. This fact of life in computing is also an important part of SQL Server architecture.

Whenever data is written to or read from a SQL Server database, it will be copied into memory by the buffer manager. The buffer cache (also known as the buffer pool) will use as much memory as it’s allocated to it in order to hold as many pages of data as possible.

Buffer cache hit ratio

Buffer cache hit ratio (BCH) defines the percent of data pages read from buffer cache, normally sql engine tries to find requested page from cache if not found there it fetch page from disk as I mentioned above. Nominally 90-95% plus threshold is good for BCH.

If the DBA is seeing low readings for the buffer cache hit ratio, the Page Life Expectancy statistic should be checked.

You can find the current state of memory usage in SQL Server by checking the sys.dm_os_sys_info DMV:

SELECT
physical_memory_kb,
virtual_memory_kb,
committed_kb,
committed_target_kb
FROM sys.dm_os_sys_info;

Current_State_of_Memory

physical_memory_kb: Total physical memory installed on the server. virtual_memory_kb: Total amount of virtual memory available to SQL Server. committed_kb: The amount of memory currently allocated by the buffer cache for use by database pages.                                                                                        committed_target_kb: This is the amount of memory the buffer cache “wants” to use. If the amount currently in use (indicated by committed_kb) is higher than this amount, then the buffer manager will begin to remove older pages from memory. If the amount currently in use is lower, than the buffer manager will allocate more memory for our data.

Page Life Expectancy

PLE is a measure of, on average, how long (in seconds) will a page remain in memory without being accessed, after which point it is removed. This is a metric that we want to be higher as we want our important data to remain in the buffer cache for as long as possible. When PLE gets too low, data is being constantly read from disk into the buffer cache, removed from the cache, and likely read from disk again in the near future. And this means our SQL Server is slow!

To view the current PLE on a server, you can run the following query, which will pull the current value from the performance counter dynamic management view:

SELECT *
FROM sys.dm_os_performance_counters
WHERE dm_os_performance_counters.object_name LIKE ‘%Buffer Manager%’
AND dm_os_performance_counters.counter_name = ‘Page life expectancy’;

PLE_Value

cntr_value is the value of the performance counter, and on my quiet local server is 114,598 seconds.
There is a query give us more detailed result:
SELECT
ple.[Node]
,LTRIM(STR([PageLife_S]/3600))+’:’+REPLACE(STR([PageLife_S]%3600/60,2),
SPACE(1),’0′)+’:’+REPLACE(STR([PageLife_S]%60,2),SPACE(1),’0′) [PageLife]
,ple.[PageLife_S]
,dp.[DatabasePages] [BufferPool_Pages]
,CONVERT(DECIMAL(15,3),dp.[DatabasePages]*0.0078125) [BufferPool_MB]
,CONVERT(DECIMAL(15,3),dp.[DatabasePages]*0.0078125/[PageLife_S]) [BufferPool_MiB_S]
FROM
( SELECT [instance_name] [node],[cntr_value] [PageLife_S]
FROM sys.dm_os_performance_counters
WHERE [counter_name] = ‘Page life expectancy’
) ple
INNER JOIN
(SELECT [instance_name] [node],[cntr_value] [DatabasePages]
FROM sys.dm_os_performance_counters
WHERE [counter_name] = ‘Database pages’
) dp
ON ple.[node] = dp.[node]

PLE_Value_detailed

The most obvious question now is, “What is a good value for PLE?”

DBA needs to be relied on the amount of memory being used by the data cache in SQL Server to determine when Page Life Expectancy was a sign of impending problems.  This means I replaced the old 300 threshold with a more reasonable value of (DataCacheSizeInGB/4GB *300) as a basis for when to begin investigating things.

It means if Server has 64GBs of memory and 58GBs is allocated to cache, PLE value should be 58*75=4350seconds

So, for a system that has 64GB of RAM and has a MaxServerMemory SQL setting of 58GB, the “minimum PLE before there is an issue” value is 4350.

What can we do if PLE is to low?

  1. If the problem is happening now, investigate which queries are causing reads by using the sys.dm_os_waiting_tasks DMV to see which threads that are waiting for pages to be read from disk (i.e. those waiting for PAGEIOLATCH_SH), and then fix those queries.
  2. If the problem happened in the past, look in the sys.dm_exec_query_stats DMV for queries with high numbers of physical reads, or use a monitoring tool that can give you that information (e.g. the Top SQL view in Performance Advisor), and then fix those queries.
  3. Correlate the PLE drop with scheduled Agent jobs that perform database maintenance.
  4. Look for queries with very large query execution memory grants using the sys.dm_exec_query_memory_grants DMV, and then fix those queries.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s