Page Life Expectancy Doesn’t Mean Jack, and You Should Stop Looking At It.

Last Updated 4 years ago

Page Life Expectancy is a Perfmon counter that’s supposed to track how long pages stay in memory before they get flushed out to make room for other pages that are needed instead.

Paul Randal has blogged this a few times, and rather than rehash ’em, I’d rather point you to a couple of his roundups:

And layer in a few of my own observations for context:

Page Life Expectancy goes up 1 for each second that you’re not under memory pressure. Restart your SQL Server instance and watch PLE: it starts at 1, and goes up by 1 for each second of uptime. 5 minutes of uptime = PLE 300. For the first 5 minutes, it’s not like your server’s under memory pressure – it just woke up, for crying out loud. Give him a minute. Well, 15-20, I guess, because PLE is near useless during that time span.

Page Life Expectancy drops can be triggered by confusing operations. By default, any one running query can get a memory grant the size of 25% of your buffer pool. Run a few of those queries at the same time, and your buffer pool gets drained – but PLE doesn’t necessarily drop. However, the instant an unrelated query runs and needs to get data that isn’t cached in RAM, your PLE will drop catastrophically. Which queries are at fault? The queries getting large grants, or the queries doing reads? “Yes.”

Page Life Expectancy is a lagging indicator. Lagging indicators are something that tell you about an emergency long after the emergency has happened, and lagging indicators don’t recover quickly after the emergency is over. When you combine the above two problems – PLE only rising 1 per second, and PLE dropping at times that aren’t necessarily tied to the dropping buffer pool – then if you’re alerting based on low PLE numbers, you could have already missed the emergency. When you go log in and look for what long-running queries are running, it’s already too late. Instead, you should be using leading indicators: things that tell you a problem is coming up.

With that stuff in mind, I’ve removed the Page Life Expectancy warnings from sp_BlitzFirst altogether. I think they’re distracting y’all from the real leading indicator, wait stats – and of course, sp_BlitzFirst shows that, so I’d rather focus your attention there.

What Was the Worst Database Decision You Ever Made?

How Doug Lowing Gets the Most Out of His Live Class Season Pass

26 Comments. Leave new

Recce
June 17, 2020 6:18 am

Say it isn’t so!!!!!
Next thing you’ll say is I shouldn’t be looking at disk queue length and dividing it by the number of spindles in my RAID-5 array

Reply
Recce
June 17, 2020 6:45 am

I do wish “best practice” came with a BBE date.

Reply
Grzegorz ?yp
June 17, 2020 8:08 am

What should be used instead PLE to see memory pressure problem?

Reply
- Brent Ozar
  June 17, 2020 8:08 am
  
  Wait stats, as I explain in the post.
  
  Reply
Roman
June 17, 2020 9:15 am

“Page Life Expectancy Low” Finding is removed? Please no.

I have inherited a lot of old apps and procedures from former sql admins which I will never understand. So when I find a “Page Life Expectancy Low” issue I thorow RAM at the server.

Reply
- Brent Ozar
  June 17, 2020 9:15 am
  
  Roman – instead, look for high PAGEIOLATCH or RESOURCE_SEMAPHORE waits, and throw memory at it in those situations. It’ll be way more cost-effective.
  
  Reply
  - Roman
    June 17, 2020 11:34 am
    
    Thanks, for the hint. I get both to together right now. But I have never took a look at PAGEIOLATCH as the priority is much lower
    Priority FindingsGroup Finding
    50 Server Performance Page Life Expectancy Low
    200 Wait Stats PAGEIOLATCH_SH
    
    Reply
    - Brent Ozar
      June 17, 2020 11:37 am
      
      Yep, that’s exactly why I’m changing it: people were focusing too much on the wrong metric. Glad to see the change is having the desired effect.
      
      Reply
  - John Bougeois
    March 12, 2024 3:43 pm
    
    I had a situation where I was experiencing with high RESOURCE_SEMAPHORE waits. The issue was Indexing. I had been reorganizing indexes and when I changed it to index rebuild online, it solved the issue. Reorganizing just was no longer doing the job. But be careful with Index rebuild, especially in sql server standard. It can cause table locking and blocking.
    
    Reply
- Jeffrey Mergler
  June 17, 2020 9:38 am
  
  @Roman, you can use Tim Radney’s script if you feel the need to query PLE: http://timradney.com/2013/03/08/what-is-page-life-expectancy-ple-in-sql-server/
  
  That said, I will now start honing in on waitstats as brent mentions above.
  
  Reply
Jeffrey Mergler
June 17, 2020 9:35 am

No complaints here, I didn’t even notice PLE was in BlitzFirst and there are plenty of scripts which give you PLE.

Reply
Vince Thomas
June 17, 2020 11:01 am

I agree. When we switched over to very fast SSD drives a couple years ago, I stopped worrying about PLE.

Reply
Martin
June 17, 2020 10:04 pm

I remember, back in the day, it was all “buffer cache hit ratio is great”, then “no, BCHR is garbage – use PLE – its great”. Now “PLE is garbage use waitstats”. Am I starting to sound old??? ?

Reply
Dave Potts
June 18, 2020 4:27 am

I take the point about PLE being a lagging indicator but so are wait stats aren’t they?

We have PLE checks but they look at the direction of travel not the values. If it is going up or flat with ‘high enough’ value we don’t generate alerts. If it is flat with a low value or falling then we need to take a look.

Reply
- Brent Ozar
  June 18, 2020 5:43 am
  
  Nah, like I mention in the post, you can hit RESOURCE_SEMAPHORE waits long before PLE starts to drop.
  
  Reply
Filip
June 19, 2020 6:44 am

The good guys of Solarwinds should read this. 🙂

Reply
- Brent Ozar
  June 19, 2020 7:57 am
  
  Well, don’t blame monitoring tools: when I worked for Quest, we had that same discussion, but the product management team said, “We can’t remove that metric. Too many DBAs *THINK* it’s a useful metric, and if we don’t show it, they’ll think our tool sucks.” So I get why the vendors have to put it on there.
  
  Reply
  - Filip
    June 19, 2020 7:59 am
    
    Haha yeah i don’t ? Still love Solarwinds!
    
    Reply
  - John Zabroski
    December 5, 2022 11:47 pm
    
    I like Page Life Expectancy over long periods of time to understand the general load patterns on a server. Agree, it is a lagging indicator, but lagging indicators are often good “acausal metrics”. Acausality is a weird topic for most engineers to wrap their heads around, but it’s how airplanes remain stable in flight – wings of the plane get input from a sensor at the nose cone of the plane on possible lift conditions, so the wings know milliseconds before it happens what the wind conditions will be.
    
    In the same way, if you’re coming in as a consultant and trying to understand load patterns, page life expectancy is a really cheap way to learn something about a server that has no prior diagnostic data available – assuming they didn’t just recycle the box.
    
    With cloud databases becoming more common, eventually it will fade away completely, since you will just have all the useful metrics available to you.
    
    And, back in the 2000s when Trust In The Rust spinning disks were still common, page life expectancy was somewhat useful for a different reason, but back then I preferred to just watch query cache metric in Windows PerfMon along with CPU activity and see if my query was parallelizing perfectly and how much of the cache was hit or missed.
    
    Reply
Filip
June 19, 2020 7:59 am

Haha yeah i don’t 🙂 Still love Solarwinds!

Reply
Jon
June 24, 2020 6:38 pm

Heyo Brent, ironically timed article because I was just asking about this on StackExchange two weeks ago. One of the first things I looked into we’re wait stats for any indication of memory pressure. I noticed my top 2 wait stats were: “MEMORY_ALLOCATION_EXT” and “RESERVED_MEMORY_ALLOCATION_EXT”. Are these any indication of memory pressure or I should be looking for other specific wait stats? (These two were my highest wait types by an order of magnitude more than the next highest.) Thanks.

Reply
- Brent Ozar
  June 24, 2020 6:44 pm
  
  Hi! For personal help with your server, click Consulting at the top of the site. Thanks!
  
  Reply
WillemHenk
October 22, 2020 10:55 pm

I just came across this post from someone who recommended looking at it. 😛 [deleted] Kinda old, but it pops up on Google right away. 🙂

Reply
- Brent Ozar
  October 23, 2020 3:47 am
  
  Thanks for the heads up! Deleted that.
  
  Reply
RasmusAlsen
December 22, 2022 10:37 am

I must break down and confess I use this to indicate that I have done a good job.

Some 3 years ago the Page life expectancy on our SQL Server were always between 6.000 and 12.000.
We then installed some more memory and it went up between 40.000 and 50.000

I then had the BI and .Net team start using a redundant database for their work.
Their last queries on the production database were performed 2 days ago and the PLE has been climbing since.
It is now showing 141.000 and we have never had a more stable production system.

Am I wrong for using this as one (of many) indicators that things are better now than 3 years ago.

Reply
- Brent Ozar
  December 24, 2022 3:32 pm
  
  Time to head over to my Mastering Server Tuning class.
  
  Reply