esxtop Memory View

There are many useful things to look at when in the memory view of esxtop.

Several important things to look at near the top of the esxtop.

PMEM /MB – memory for the host

VMKMEM /MB – memory for the VMkernel

PSHARE /MB – ESXi page sharing statistics

SWAP /MB – ESXi swap usage statistics

ZIP /MB – ESXi compression statistics

MEMCTL /MB – ESXi balloon statistics

Now looking at the virtual machines down below host information, you can see several counters listed that can be of use when troubleshooting an individual VM or group of VMs:

MEMSZ – amount of configured guest physical memory

GRANT – amount of guest physical memory granted

SZTGT – amount of memory to be allocated to a machine

TCHD – amount of guest physical memory recently used by the VM

TCHD_W – write working set estimate for a resource pool

SWCUR – current swap usage

SWTGT – expected swap usage

SWR/s – swap in from disk rate

SWW/s – swap out to disk rate

LLSWR/s – memory read from host cache rate

LLSWW/s – memory write to host cache rate

OVHDUW – overhead memory reserved for the vmx user world of a VM group.

OVHD – amount of overhead currently consumed by a VM

OVHDMAX – amount of reserved overhead memory for a VM

Ideally, you’ll look at esxtop and never see any kind of numbers for balloon, compression or swap activity. However if you do see this activity then the ESXi host is overcommitted and is in contention. More resources need to be added the the ESXi host, the cluster or some of the VMs need to be moved to an ESXi host with memory resources available.

esxtop CPU View

The default view of esxtop is CPU, there are several useful counters in this view.

GID – group ID

NAME – virtual machine name

NWLD – number of worlds

%USED – percentage physical CPU time accounted to this world

%RUN – percentage of total scheduled time for the world to run

%SYS – percentage of time spend by system services for that world

%WAIT – percentage of time spent by the world in a wait state

%VMWAIT – derivative of %WAIT except it doesn’t include %IDLE

%RDY – percentage of time the world was ready to run

%IDLE – percentage of time the vCPU world is in idle loop

%OVRLP – percentage of time spend by system services on behalf of other worlds

%CSTP – percentage of time the world spend in ready, co-deschedule state (only relevant to SMP VMs)

%MLMTD – percentage of time world was ready to run but was not scheduled because that would violate “CPU limit” settings

%SWPWT – percentage of time the world is waiting for the VMkernel swapping memory

High CPU ready time is a major indicator of CPU performance issues, you may have excessive usage of vSMP or a limit set (check %MLMTD for that). Another metric to check is %CSTP, this will help you determine whether you can decrease the amount of vCPUs for some of the virtual machines which will help with improving scheduling opportunities.

%SYS is usually caused by high IO virtual machine. %SWPWT is usually caused by memory overcommitment.

esxtop Network View

The last post discussed navigating esxtop, now let’s get into each view a little bit more.

There are several network counters that are default when you go to the networking view, here’s a brief overview of each:

PKTTX/s – # of packets transmitted per second

MbTX/s – MegaBits transmitted per second

PKTRX/s – # of packets received per second

MbRX/s –  MegaBits received per second

%DRPTX – percentage of transmit packets dropped

%DRPRX – percentage of receive packets dropped

A major indicator of potential network performance issues is dropped packets. This can be indicative of a physical device failing, queue congestion, bandwidth issues, etc.

Something else to check when having network issues is high CPU usage, the CPU Ready Time counter (%RDY) can be beneficial when diagnosing CPU issues.

If you are having these issues in your environment, consider using jumbo frames, taking advance of hardware features provided by the NIC like TSO (TCP Segmentation Offload) and TCO (TCP Checksum Offload)

Also, make sure to check out physical network trunks, interswitch links, etc for overloaded pipes.

Consider: moving the VM with high network demand to another switch, adding more uplinks to a virtual switch and check for which vNIC driver is being used.

Navigating ESXTOP

A tool that is very useful is “esxtop.” This command-line tools allows monitoring and collecting of data for the core four resources: CPU, memory, network, and disk.

After enabling SSH on an ESXi host, open up PuTTY and connect to that ESXi host using your root account and password.

Start running esxtop by typing the command on a single line:

esxtop

Once the tool is running, you need to know how to work with it. It runs from the command line and is managed via key strokes.

By default, the tools begins running in the CPU view. I can change views by simply typing “n” for the network view, “d” for the disk view, and “m” for the memory view.

In any view I can type “f” to open up the field screen. From here I can modify which counters are shown in the particular view I am in. I can customize the counters in all of my views. To select/deselect any counter, simply type the letter associated with it. To exit this view, press the space bar.

From any view, I can type a “V” (shift + v) to parse the list and only view virtual machine information.

To get even more information about a virtual machine, type “e” and enter the GID (Group ID) of your virtual machine and press enter. In the screenshot, I entered the GID of Test01-A so that I could view all the VM’s associated worlds.

A world is basically just a process. A world is a scheduled component of the VM, like a process on a typical OS. Worlds are scheduled by the VMkernel just like processes are scheduled. The VM is represented as a group, which gets a single world ID. There are worlds within the world to monitor vCPU, VMM, and MKS (Mouse/Keyboard/Screen).

I will be posting more on esxtop and its counters as I go through my studies. This post is just a quick guide to navigating.