Thursday, August 8, 2013

Understanding Memory issues through Perfmon:Page faults



I have been using Perfmon almost daily, but never took pain to learn how to use it efficiently until I came across a situation …In this part we will discuss using perfmon to see memory bottleneck.
Before we proceed forward let us go through few definitions so that the rest of the stuff is easily understood.

Virtual memory: Its actually virtual …and from here the concept of paging comes in to picture .On a 32 bit machine its 2 gigs for user process by default ….So, if your RAM is 512 MB, the remaining stuff (I mean greater that 512 MB) that’s going to come from Virtual memory will page out to disk .SO operations worth 2 gigs will happen in VAS (VM) and as a storage RAM will be used and the moment RAM limit is exceeded …..Paging/Swapping .

Page: Committed data in VAS is written to Page tables .From there it goes to RAM or Pagefile .This data is written in to pages (in bytes) and its size is processor dependent. this page is fitted in to page frame in RAM…it is essential that , that frame must be filled with zeros before a page from disk is inserted into page frame .zeroing is essential as the page before zeroing may have information from any other process ..So when we restart Windows this zeroing activity takes place and complete by the virtual memory manager before the Windows comes up..And if there is some problem during zeroing then………….. :D …we will discuss some other time …

Page file: a space in the HDD to be used to save committed memory objects.

Reserved memory: the memory quota given to a process by the VM manager .So the process first has to consume this before the next quota is allocated …

Committed memory: The chunk from reserved memory that is actually being used by the process. Remember that whenever memory is committed the same amount of space is created on the pagefile so that if needed it can be flushed to disk.

Working Set: Physical memory assigned to a process by VM manager (or say OS).It will be always less than Private Bytes.

Page Fault : When the page needed is not found in the working set of the process but its with in the RAM (may be in some cache) its called as Soft page fault .But when the page is not at all found in the RAM , its paged out and called as Hard page fault.

So now, it’s sure that whenever there is a memory issue there will be high (hard) Page fault rate...If there is a constant high hard page fault rate it means that there is some data not found in the RAM for the working set of that process .So , it has to be fetched from the disk (which is going to be costly).And the hard page faults cause Paging .So is paging harmful ….No (we cannot avoid it ), but the excess of it is a signal that something is going wrong .Generally I have seen that keeping the page file to be equal to 1.5 times of RAM is ok ..Not bad.

In case of memory issue the counters that we need to see in Perfmon are Memory and Process...

The Sub counters for memory counter are: Page Faults/sec, Page Reads/sec, Page Writes/sec, Pages Input/sec, Pages Output/sec, Available bytes and nonpaged pool bytes.

The Sub counters for Process counter are: Page Faults/sec, working set, Private Bytes and Page File Bytes

So, % Soft page faults = ((Page fault/sec - Pages input/sec) x 100)/Page faults/sec

Now, if there are more hard page faults, there should be more pages output/sec (no of pages sent to disk to make room for the pages coming form disk due to hard page fault) since there is a limited working set and hence there has to be some space for the pages coming from the disk (hard page faults) ...Pages read/sec and pages input/sec should be almost similar but can be a bit different …So these counters can tell you the system story..One more counter that I forgot to mention is Available Bytes .Its the no. of bytes left for the process to allocate. To see how much SQL Server shares you can see Page Faults/sec and proportionate it with Total Page faults. Generally, hard page fault of 10% to 15% should be tolerated …Before, we totally blame memory let’s check the Disk performance as well...

Subtract Memory:Page reads/sec from logical disk:disk reads/sec .If the output is small (+ or -) then the memory is the bottleneck because this actually means that all the disk reading is done to find the faulting pages .Also if disk output/sec is high then you can check by finding the difference of disk output/sec and Disk write bytes/sec ÷ 4096 ├čintel and fine its percentage in total Disk write bytes/sec

Similarly, you can then correlate it to SQL Server as well …..Once you are sure that memory is the bottleneck then we can proceed with the troubleshooting steps...
Next time we will try to dig into memory issues including memory leaks and also perfmon counters specific to SQL Server …