6.2 Performance Monitoring

Currently Being Moderated

VERSION 13

Created on: Oct 27, 2010 1:31 PM by Zenoss API - Last Modified: Oct 27, 2010 1:49 PM by Zenoss API

2. Performance Monitoring
Prev	Chapter 6. Core Monitoring	Next

2. Performance Monitoring

Read this chapter to learn about performance monitoring and monitoring templates.

2.1. About Performance Monitoring

Zenoss uses several methods to monitor performance metrics of devices and device components. These are:

ZenPerfSNMP - Collects data through SNMP from any device correctly configured for SNMP monitoring.
ZenWinPerf (Enterprise only) - ZenPack that allows performance monitoring of Windows servers.
ZenCommand - Logs in to devices (by using telnet or ssh) and runs scripts to collect performance data.
Other ZenPacks - Collect additional performance data. Examples include the ZenJMX ZenPack, which collects data from enterprise Java applications, and the HttpMonitor ZenPack, which checks the availability and responsiveness of Web pages.

Regardless of the monitoring method used, the system stores performance monitoring configuration information in monitoring templates.

2.2. About Monitoring Templates

Monitoring templates determine how the system collects performance data for devices and device components. You can define monitoring templates for device classes and individual devices.

Templates comprise three types of objects:

Data Sources - Specify the exact data points to collect, and the method to use to collect them.
Thresholds - Define expected bounds for collected data, and specify events to be created if the data does not match those bounds.
Graph Definitions - Describe how to graph the collected data on the device or device components.

Before the system can collect performance data for a device or component, it must determine which monitoring templates apply. This process is called template binding.

2.2.1. Viewing Monitoring Templates

To view monitoring templates, select Advanced from the navigation bar, and then select Monitoring Templates.

Figure 6.4. Monitoring Template for Load Average Graph

2.3. Template Binding

Before the system can collect performance data for a device or component, it must determine which templates apply. This process is called template binding.

First, the system determines the list of template names that apply to a device or component. For device components, this usually is the meta type of the component (for example, FileSystem, CPU, or HardDisk). For devices, this list is defined by the zDeviceTemplates configuration property.

After defining the list, the system locates templates that match the names on the list. For each name, it searches the device and then searches the device class hierarchy. Zenoss uses the lowest template in the hierarchy that it can locate with the correct name, ignoring others of the same name that might exist further up the device class hierarchy.

2.3.1. Binding Templates

To edit the templates bound to a device:

From the navigation bar, select Infrastructure.
The device list appears.
Select a device in the device list.
Select Bind Templates from (Action menu).
The Bind Templates dialog appears.

Figure 6.5. Bind Templates
Select a template from the Available list and move it to the Selected list to bind it to the selected device.
Click Save.

2.4. Data Sources

Data sources specify which data points to collect and how to collect them. Each monitoring template comprises one or more data sources. The system provides two built-in data source types: SNMP and COMMAND. (Other data source types are provided through ZenPacks.)

SNMP - Define data to be collected via SNMP by the ZenPerfSNMP daemon. They contain one additional field to specify which SNMP OID to collect. (Many OIDs must end in .0 to work correctly.) Because SNMP data sources specify only one performance metric, they contain a single data point.
Command - specify data to be collected by a shell command that is executed on the Zenoss server or on a monitored device. The ZenCommand daemon processes COMMAND data sources. A COMMAND data source may return one or more performance metrics, and usually has one data point for each metric.
Shell commands used with COMMAND data sources must return data that conforms to the Nagios plug-in output specification. For more information, see the section titled Monitoring Using ZenCommand.

2.4.1. Adding a Data Source

To add a data source to a monitoring template:

Select Advanced from the navigation bar, and then select Monitoring Templates.
In the tree view, select the monitoring template to which you want to add a data source.
In the Data Sources area, select (Add Data Source) from the Action menu.
The Add Data Source dialog appears.
Enter a name for the data source and select the type, and then click Submit.
The data source is added to the list in the Data Sources area.
Double-click the data source in the list.
The Edit Data Source dialog appears.
Enter or select values to define the data source.

2.5. Data Points

Data sources can return data for one or more performance metrics. Each metric retrieved by a data source is represented by a data point.

Defining Data Points

You can define data points to data sources with all source types except SNMP and VMware. Because these data source types each rely on a single data point for performance metrics, additional data point definition is not needed.

To add a data point to a data source:

Select Advanced from the navigation bar, and then select Monitoring Templates.
In the Data Sources area, highlight the row containing a data source.
Select Add Data Point from the Action menu.
The Add Data Point dialog appears.
Enter a name for the data point, and then click Submit.
Note
For COMMAND data points, the name should be the same as that used by the shell command when returning data.
Double-click the newly added data point to edit it. Enter information or make selections to define the data point:
- Name - Displays the name you entered in the Add a New DataPoint dialog.
- RRD Type - Specify the RRD data source type to use for storing data for this data point. ( Zenoss uses RRDTool to store performance data.) Available options are:
  - COUNTER - Saves the rate of change of the value over a step period. This assumes that the value is always increasing (the difference between the current and the previous value is greater than 0). Traffic counters on a router are an ideal candidate for using COUNTER.
  - GAUGE - Does not save the rate of change, but saves the actual value. There are no divisions or calculations. To see memory consumption in a server, for example, you might want to select this value.
    Note
    Rather than COUNTER, you may want to define a data point using DERIVED and with a minimum of zero. This creates the same conditions as COUNTER, with one exception. Because COUNTER is a "smart" data type, it can wrap the data when a maximum number of values is reached in the system. An issue can occur when there is a loss of reporting and the system (when looking at COUNTER values) thinks it should wrap the data. This creates an artificial spike in the system and creates statistical anomalies.
  - DERIVE - Same as COUNTER, but additionally allows negative values. If you want to see the rate of change in free disk space on your server, for example, then you might want to select this value.
  - ABSOLUTE - Saves the rate of change, but assumes that the previous value is set to 0. The difference between the current and the previous value is always equal to the current value. Thus, ABSOLUTE stores the current value, divided by the step interval.
- Create Command - Enter an RRD expression used to create the database for this data point. If you do not enter a value, then the system uses a default applicable to most situations.
  For details about the rrdcreate command, go to:
  http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
- RRD Minimum - Enter a value. Any value received that is less than this number is ignored.
- RRD Maximum - Ener a value. Any value received that is greater than this number is ignored.
Click Save to save the defined data point.

2.6. Data Point Aliases

Performance reports pull information from various data points that represent a metric. The report itself knows which data points it requires, and which modifications are needed, if any, to put the data in its proper units and format.

The addition of a data point requires changing the report.

Figure 6.6. CPU Utilization Report

To allow for more flexibility in changes, some reports use data point aliases. Data point aliases group data points so they can be more easily used for reporting. In addition, if the data points return data in different units, then the plugin can normalize that data into a common unit.

An alias-based report looks up the data points that share a common alias string, and then uses them. This approach allows you to add data points without changing the report.

Figure 6.7. Alias-Based CPU Utilization Report

In the simplest cases, data from the target data points are returned in units expected by a report. For cases in which data are not returned in the same units, an alias can use an associated formula at the data point. For example, if a data point returns data in kilobytes, but the report expects data in bytes, then the formula multiplies the value by 1024.

2.6.1. Alias Formula Evaluation

The system evaluates the alias formula in three passes.

2.6.1.1. Reverse Polish Notation

When complete, the alias formula must resolve to a Reverse Polish Notation (RPN) formula that can be used by RRDtool. For the simple conversion of kilobytes into bytes, the formula is:

1024,*

For more information on RRDtool and RPN formulas, browse to this site:

http://oss.oetiker.ch/rrdtool/doc/rrdgraph_rpn.en.html

2.6.1.2. Using TALES Expressions in Alias Formulas

For cases in which contextual information is needed, the alias formula can contain a TALES expression that has access to the device as context (labeled as "here"). The result of the TALES evaluation should be an RRD formula.

For example, if the desired value is the data point value divided by total memory, the formula is:

${here/hw/totalMemory},/

For more information on TALES, refer to the appendix in this guide titled "TALES Expressions," or to the TALES Specification 1.3, at:

http://wiki.zope.org/ZPT/TALESSpecification13

2.6.1.3. Using Python in Alias Formulas

You also can embed full Python code in an alias formula for execution. The code must construct a string that results in a valid RRD formula. To signal the system to evaluate the formula correctly, it must begin with:

__EVAL:

Using the same example as in the previous section (division by total memory), the formula is:

__EVAL:here.hw.totalMemory + “,/”

2.6.2. Adding a Data Point Alias

To add an alias to a data point:

Navigate to a data source on a monitoring template.
Double-click a data point in the list to edit it.
The Edit Data Point dialog appears.
Enter the alias name and the formula.
Note
If the data point returns values in the desired units, then leave the Formula value blank.

Figure 6.8. Add Data Point Alias
Click Save.

2.6.3. Reports That Use Aliases

For information about reports that use aliases, refer to the chapter titled "Reporting."

The following table shows performance reports that use aliases, and the aliases used. To add data points to a report, add the alias, and then ensure the values return in the expected units.

CPU Utilization

Alias	Expected Units
loadAverage5min	Processes
cpu_pct	Percent

2.7. Thresholds

Thresholds define expected bounds for data points. When the value returned by a data point violates a threshold, the system creates an event.

2.7.1. MinMax Threshold

The system provides one built-in threshold type: the MinMax threshold. (Other threshold types are provided through ZenPacks.)

MinMax thresholds inspect incoming data to determine whether it exceeds a given maximum or falls below a given minimum. You can use a MinMax threshold to check for these scenarios:

The current value is less than a minimum value. To do this, you should set only a minimum value for the threshold. Any value less than this number results in creation of a threshold event.
The current value is greater than a maximum value. To do this, you should set only a maximum value for the threshold. Any value greater than this number results in creation of a threshold event.
The current value is not a single, pre-defined number. To do this, you should set the minimum and maximum values for the threshold to the same value. This will be the only "good" number. If the returned value is not this number, then a threshold event is created.
The current value falls outside a pre-defined range. To do this, you should set the minimum value to the lowest value within the good range, and the maximum value to the highest value within the good range. If the returned value is less than the minimum, or greater than the maximum, then a threshold event is created.
The current value falls within a pre-defined range. To do this, you should set the minimum value to the highest value within the bad range, and the maximum value to the lowest value within the bad range. If the returned value is greater than the maximum, and less than the minimum, then a threshold event is created.

2.7.2. Adding Thresholds

Follow these steps to define a MinMax threshold for a data point:

Select Advanced from the navigation bar, and then select Monitoring Templates.
In the Thresholds area, click (Add Threshold).
The Add Threshold dialog appears.
Select the threshold type and enter a name, and then click Add.
Double-click the newly added threshold in the list to edit it.
The Edit Threshold dialog appears.

Figure 6.9. Edit Threshold
Enter or select values to define the threshold:
- Name - Displays the value for the ID you entered on the Add a New Threshold dialog.
- Data Points - Select one or more data points to which this threshold will apply.
- Severity - Select the severity level of the first event triggered when this threshold is breached.
- Enabled - Select True to enable the threshold, or False to disable it.
- Minimum Value - If this field contains a value, then each time one of the select data points falls below this value an event is triggered. This field may contain a number or a Python expression.
  When using a Python expression, the variable here references the device or component for which data is being collected. For example, an 85% threshold on an interface might be specified as:
```
here.speed * .85/8
```
  The division by 8 is because interface speed frequently is reported in bits/second, where the performance data is bytes/second.
- Maximum Value - If this field contains a value, then each time one of the selected data points goes above this value an event is triggered. This field may contain a number or a Python expression.
- Event Class - Select the event class of the event that will be triggered when this threshold is breached.
- Escalate Count - Enter the number of consecutive times this threshold can be broken before the event severity is escalated by one step.
Click Save to save the newly defined threshold.

2.8. Performance Graphs

You can include any of the data points or thresholds from a monitoring template in a performance graph.

To define a graph:

Select Advanced from the navigation bar, and then select Monitoring Templates.
In the Graph Definitions area, click (Add Graph).
The Add Graph Definition dialog appears.
Enter a name for the graph, and then click Submit.
Double-click the graph in the list to edit it. Enter information or select values to define the graph:
- Name - Optionally edit the name of the graph you entered in the Add a New Graph dialog. This name appears as the title of the graph.
- Height - Enter the height of the graph, in pixels.
- Width - Enter the width of the graph, in pixels.
- Units - Enter a label for the graph's vertical axis.
- Logarithmic Scale - Select True to specify that the scale of the vertical axis is logarithmic. Select False (the default) to set the scale to linear. You might want to set the value to True, for example, if the data being graphed grows exponentially. Only positive data can be graphed logarithmically.
- Base 1024 - Select True if the data you are graphing is measured in multiples of 1024. By default, this value is False.
- Min Y - Enter the bottom value for the graph's vertical axis.
- Max Y - Enter the top value for the graph's vertical axis.
- Has Summary - Select True to display a summary of the data's current, average, and maximum values at the bottom of the graph.
  
  Figure 6.10. Graph Definition

Click Submit to save the graph.

2.8.1. Graph Points

Graph points represent each data point or threshold that is part of a graph. You can add any number of graph points to a graph definition by adding data points or thresholds.

From the Graph Definitions area of the Monitoring Templates page:

Select Manage Graph Points from the Action menu.
The Manage Graph Points dialog appears.
From the Add menu, add a data point, threshold, or custom graph point.
Select values, and then click Submit.
the new graph point appears in the Graph Points list.
Note
Thresholds are always drawn before other graph points.

2.8.1.1. Re-sequencing Graph Points

To re-sequence graph points, drag a graph point row in the Manage Graph Points dialog. (Click and drag from an "empty" part of the row.)

2.8.1.2. DataPoint Graph Points

DataPoint graph points draw the value of data points from the template on a graph.

2.8.1.2.1. Adding DataPoint Graph Points

To define a DataPoint graph point:

From the Add menu on the Manage Graph Points dialog, select Data Point.
The Add Data Point dialog appears.
Select one or more data points defined in this template. On data point graph point is created for each data point you select from the list.
Optionally select the Include Related Thresholds option. If selected, then any graph points are created for any thresholds that have been applied to the selected data points as well.
Click Submit.

2.8.1.2.2. Editing DataPoint Graph Points

Double-click the name of the graph point to go to its edit page. Enter information or select values to edit the graph point:

Name - This is the name that appears on the Graph Definition page. By default, it appears in the graph legend.
Line Type - Select Line to graph the data as a line. Select Area to fill the area between the line and the horizontal axis with the line color. Select None to use this data point for custom RRD commands and do not want it to be explicitly drawn.
Line Width - Enter the pixel width of the line.
Stacked - If True, then the line or area is drawn above the previously drawn data. At any point in time on the graph, the value plotted for this data is the sum of the previously drawn data and the value of this data point now. You might set this value, for example, to asses total packets if measuring packets in and packets out.
Format - Specify the RRD format to use when displaying values in the graph summary. For more information on RRDTool formatting strings, go to:
http://oss.oetiker.ch/rrdtool/doc/rrdgraph_graph.en.html
RPN - Optionally enter an RPN expression that alters the value of the data being graphed for the data point. For example, if the data is stored as bits, but you want to graph it as bytes, enter an RPN value of "8,/" to divide by 8. For more information about RRDTool RPN notation, go to:
http://oss.oetiker.ch/rrdtool/tut/rpntutorial.en.html
Limit - Optionally specify a maximum value for the data being graphed.
Consolidation - Specify the RRD function used to graph the data point's data to the size of the graph. Most of the time, the default value of AVERAGE is appropriate.
Color - Optionally specify a color for the line or area. Enter a six-digit hexadecimal color value with an optional two-digit hex value to specify an alpha channel. An alpha channel value is only used if 'stacked' is True.
Legend - Name to use for the data in the graph legend. By default, this is a TALES expression that specifies the graph point name. The variables available in this TALES expression are here (the device or component being graphed) and graphPoint (the graph point itself).
Available RRD Variables - Lists the RRD variables defined in this graph definition. These values can be used in the RPN field.

2.8.1.3. Editing Threshold Graph Points

Threshold graph points graph the value of thresholds from the template.

To edit a threshold graph point, double-click it in the list:

You can edit values for Name, Color, and Legend for a threshold graph point. Refer to the definitions in the section titled Editing DataPoint Graph Points for more information.

2.8.1.4. Editing Custom Graph Points

Custom graph points allow you to insert specific RRD graph commands into the graph definition.

For details on DEF, CDEF, and VDEF commands, go to:

http://oss.oetiker.ch/rrdtool/doc/rrdgraph_data.en.html

For details on other RRD commands, go to:

http://oss.oetiker.ch/rrdtool/doc/rrdgraph_graph.en.html

2.8.2. Custom Graph Definition

Custom graph definitions allow you to specify your own set of RRD commands to draw a graph. The graph points specified are used to define data that is available to the commands you specify here; however, the graph points are not drawn unless you explicitly draw them with the commands you specify. The Available RRD Variables lists the values defined by the graph points that are available for use.

To access custom graph definitions, select Custom Graph Definition from the Action menu in the Graph Definitions area.

2.8.3. Graph Commands

Graph Commands show an approximate representation of the RRD commands that will be used to draw a graph. This representation provides helpful debugging information when using custom graph points or custom graph definitions.

To view graph commands, select Graph Commands from the Action menu in the Graph Definitions area.

Prev	Up	Next
1. Availability Monitoring	Home	3. Monitoring Using ZenCommand

Like (0)

15945 Views

Comments (0)