Applies To:
Show VersionsBIG-IQ Centralized Management
- 6.0.0
Managing a Scaling Group in an AWS Environment
Evaluating the performance of a service scaling group's devices
When you are monitoring several service scaling groups (SSG), you can evaluate the BIG-IP VE devices within the SSG to ensure that they are performing as expected. Good health for your SSGs means that there is a low chance of a scale-out event, and that the devices in your SSG are able to provide services to applications as expected.
The health status of a SSG reflects the most severe alert status triggered in one or more devices. A Good health status indicates that all devices within the SSG are within the acceptable range of the configured SSG health alert rules. You can view the health of all your SSGs from the SSGs screen ( click
). In addition, you can use the alert history for a single SSG to identify whether a health or resource alert has been cleared.Verify the health of all your service scaling groups
Verify that all alerts to a service scaling group are cleared
Monitor resource usage in service scaling group devices
Monitor throughput in service scaling group devices
Detecting device health issues in a service scaling group
With the Analytics services to your service scaling groups, you can detect changes in device resource usage (for example, CPU, memory) and further identify the impact on the F5® BIG-IP ® VE devices and their connected applications.
Each service scaling group's health status indicates the current resource usage for all the BIG-IP VE devices within your service scaling group. When one, or more, devices cross a configured resource usage threshold, the entire service scaling group's health status is affected. These health issues can be mitigated to prevent performance impact on the traffic processing services to any of your connected applications.
Isolate a service scaling group with health issues
Isolate a service scaling group device with health issues
Detecting device performance issues in a service scaling group
The F5® BIG-IP® VE devices within a service scaling group can individually, or collectively, experience performance issues. This can occur for a number of reasons, and impacts the performance of the application services provided by the BIG-IP VE devices within a service scaling group (SSG). In order to prevent or mitigate application performance issues, you can isolate specific devices by using alerts and system data for a selected service scaling group. In addition, you can monitor the applications that are managed by each service scaling group.
Isolate a service scaling group with performance issues
Identify a service scaling group performance issue
Isolate a service scaling group device with performance issues
Device resource and performance charts
The following describes the charts found in the single service scaling group screen (
), in the Analytics area. These charts display the trends of a service scaling group's BIG-IP VE devices. Each chart displays an aspect of the devices as a function of the selected time period.Chart Menu Title | Chart Title | Description |
---|---|---|
CPU Usage | CPU Usage | The average percent CPU usage for all cores and BIG-IP devices by the
activity categories. Metric Unit: Percent Legend: User: The average percentage of CPU usage for the all the BIG-IP user space programs over a given time period. System: The average percentage of CPU usage for all the running BIG-IP systems over a given time period I/O Wait: The percentage of time (during the selected time period) that a given CPU is idle for an I/O wait operation. This occurs when at least one outstanding I/O disk operation is requested by a task scheduled on system CPU. Stolen: The percentage of time a virtual CPU waits for real CPU when the hypervisor is servicing another virtual machine. |
Top Cores | Top 6 CPU Cores | The six, most active CPU cores for all monitored BIG-IP devices. This
isolates the cores that are consuming the most CPU resources, of all the
device CPUs. Metric Unit: Percent Legend: CPU core |
Memory | Memory Usage | The percent RAM used by system processes of the monitored BIG-IP
devices. Metric Unit: Percent Legend: TMM: The average percentage RAM used by device TMM processes. Total: The average percentage of RAM used by all devices Other: The average percentage of used RAM from non-TMM processes. |
Throughput | Throughput Bytes | The average rate of traffic (in bytes) processed by the BIG-IP device
interfaces. Metric Unit: Average/s Legend: In: The average rate of incoming traffic to the BIG-IP devices. Out: The average rate of outgoing traffic from the BIG-IP devices. |
Connections | Concurrent Connections | The average number of connections that are open at the same time,
either on the client-side and on the server-side. Metric Unit: Count Legend: Client Side: The average number of concurrent connections at the client side. Server Side: The average number of concurrent connections at the server side. |
HTTP | HTTP Transactions | The transaction includes all HTTP request and response messages passed
between the client, BIG-IP system, and server. Metric Unit: Average/s Legend: Transactions: Average number of HTTP transactions per second that were processed by the BIG-IP devices. |
Dropped | Throughput Drops | The average rate of packets per second (pps) that were dropped by the
BIG-IP device interfaces or discarded by the TMM over the course of the
transaction. Metric Unit: Average/s Legend: In: The average rate of packets per second that were dropped by the BIG-IP interface. Out: The average rate of packets per second that were accepted by the BIG-IP interface, but discarded by the TMM. |
Errors | Throughput Errors | The average rate packets per second (pps) that were corrupted or
arrived incomplete over the course of the transaction across the
network Metric Unit: Average/s Legend: In: The average packets per second received as throughput error. Out: The average packets per second transmitted out at throughput error. |
Managing device monitoring settings for a service scaling group
The health of your service scaling group (SSG) is determined by the health of its F5® BIG-IP® VE devices.
Each BIG-IP VE device in a SSG is monitored by set of configurable device health alert rules that include performance metrics and their corresponding thresholds. You can adjust the alert rules for an SSG to define the health rules for its devices.
The SSG health score reflects the device metric that crossed the most severe threshold. This means that if a device metric violated a warning or critical threshold, the SSG health status becomes moderate or critical, respectively. You receive a device and SSG alert when a device alert rule violation is sustained for more than five minutes.
About device health alert rules
Device health alert rules include the metrics and corresponding thresholds that define the health status of your BIG-IP devices. You can select which metrics are included, and adjust the warning and critical threshold values.
A metric threshold violation must be sustained for 5 minutes to trigger an alert. A subsequent alert is triggered once another threshold is crossed (either an increase or decrease in severity, or cleared). To ensure that conditions are improving, an alert for declining severity (critical to warning), or an alert that has been cleared, is triggered only when the value is sustained for five minutes at ten percent below the threshold value. For example, if a threshold value is configured for greater than 60 percent, a declining severity must be sustained at 54 percent or less to trigger an alert.
Modify service scaling group device resources alerts
Service scaling group health alerts
The service scaling group (SSG) health alerts notify you of the performance status of the SSG BIG-IP® devices. This table describes service scaling group health alert.
Alert | Description | Indication | Default Thresholds | Action (if applicable) |
---|---|---|---|---|
SSG Health | There has been a change in the health status of one or more BIG-IP devices in your SSG. | One or more BIG-IP VE devices in your SSG has a sustained change in health status, which is based upon performance of device resources and/or throughput. | For SSG Devices: Customized per service scaling group. | A critical health status of your SSG can lead to a scale out recommendation. You can monitor the health of affected devices using device health alerts. |
Device health alerts
The device health alerts notify you of changes in device resource and throughput metric thresholds for your BIG-IP® devices.
Alert | Description | Indication | Default Thresholds | Action (if applicable) |
---|---|---|---|---|
Device Health | There has been a change in one or more of the of BIG-IP device health rule metrics. | One or more of the device resources and/or throughput measurements crossed a defined threshold, which may impact your BIG-IP VE device's performance. | For SSG devices: Customized per service scaling group | For SSG Devices: A critical health status of your BIG-IP VE device affects the health of the SSG. Investigate the active alerts for device metrics. |
Device alerts
The device alerts notify you of changes in a BIG-IP ® device resource and performance metrics. These alerts are found in the single service scaling group screen (
), or in the Alert History and Active Alerts screens ( ).Alert | Description | Default Thresholds | Action (if applicable) |
---|---|---|---|
Device CPU | The average CPU utilization for a BIG-IP device. | Critical > 80% Warning > 60% Cleared < 60% |
Investigate affected BIG-IP device resources. |
Device Memory | The average memory (RAM) utilization for a BIG-IP device | Critical > 80% Warning > 60% Cleared < 60% |
Investigate affected BIG-IP device resources. |
Device Throughput In | The average throughput (Mbps) of incoming traffic to a BIG-IP device. | Critical > 8Mbps Warning > 6 Mbps Cleared < 6 Mbps |
Investigate affected BIG-IP device throughput |
Device Throughput Out | The average throughput (Mbps) of outgoing traffic from a BIG-IP device. | Critical > 8Mbps Warning > 6 Mbps Cleared < 6 Mbps |
Investigate affected BIG-IP device throughput |
ASM Memory | The average device memory (RAM) used for Web Application Security services. | Critical > 80% Warning > 60% Cleared < 60% |
Investigate affected BIG-IP device's configuration for ASM memory. |
ASM Bypass Ratio | The average rate of transactions that bypassed Web Application Security services. | Critical > 0.05% Warning > 0.01% Cleared < 0.01% |
Investigate affected BIG-IP device's system resource configuration for ASM processes. |