Manual Chapter : Managing a Scaling Group in an AWS Environment

Applies To:

Show Versions Show Versions

BIG-IQ Centralized Management

  • 6.0.0
Manual Chapter

Managing a Scaling Group in an AWS Environment

Evaluating the performance of a service scaling group's devices

When you are monitoring several service scaling groups (SSG), you can evaluate the BIG-IP VE devices within the SSG to ensure that they are performing as expected. Good health for your SSGs means that there is a low chance of a scale-out event, and that the devices in your SSG are able to provide services to applications as expected.

The health status of a SSG reflects the most severe alert status triggered in one or more devices. A Good health status indicates that all devices within the SSG are within the acceptable range of the configured SSG health alert rules. You can view the health of all your SSGs from the SSGs screen ( click Applications > ENVIRONMENTS > Service Scaling Groups ). In addition, you can use the alert history for a single SSG to identify whether a health or resource alert has been cleared.

Verify the health of all your service scaling groups

You can verify that the devices in your service scaling groups (SSGs) are performing as expected, by evaluating the health status.
  1. At the top of the screen, click Applications.
  2. On the left, click ENVIRONMENTS > Service Scaling Groups .
  3. Locate the HEALTH area at the top left of the summary bar and verify that all SSGs have a good health status.
  4. In the summary bar, locate the usage and throughput areas.
    These areas list the SSGs with the highest average values of all the SSGs.
  5. To sort the screen's list by the selected metric, click the AVERAGE CPU USAGE, AVERAGE MEMORY USAGE, THROUGHPUT IN, or THROUGHPUT OUT area.
  6. Click the name of the SSG to monitor CPU, memory usage, and throughput data of the devices within that SSG

Verify that all alerts to a service scaling group are cleared

Verify that all active alerts that are used to monitor the performance of the devices within your service scaling group (SSG) have been cleared to ensure that your service scaling group is performing as expected, and there are no issues that require further attention.
  1. Open an SSG properties screen, Applications > ENVIRONMENTS > Service Scaling Groups > <Service Scaling Group Name> .
  2. In the ALERT HISTORY area click See All.
    This displays a chronological list of all alerts to the SSG, including cleared alerts. (Cleared alerts indicate that a metric threshold violation is now within the defined threshold.)
  3. In the Level column, verify that the most recent alerts have a Cleared status.

Monitor resource usage in service scaling group devices

You can monitor the CPU and memory usage data of the devices in a service scaling group (SSG) to verify that your devices are performing as expected.
  1. From the ANALYTICS menu of a service scaling group properties screen, select CPU Usage, Top Cores, or Memory ( Applications > ENVIRONMENTS > Service Scaling Groups > <Service Scaling Group Name> ).
    Charts display the average data for all BIG-IP VE devices in SSG.
  2. Using the tabs to the right of the chart, expand the Dimension pane and the BIG-IP Host Names dimension to display the SSG devices in the object list.
    You can filter charts and data in the BIG-IP CPU Cores dimension by selecting one or more devices from the object list.
  3. Click CONFIGURATION, then click Devices from the menu to the left.
  4. Select one or more device rows, and click View Health Statistics to view additional resource usage data that applies only to the selected devices.

Monitor throughput in service scaling group devices

You can monitor the throughout, including HTTP traffic data, of the devices in a service scaling group (SSG) to verify that your devices are performing as expected.
  1. Open the service scaling group properties screen for the group you want to monitor ( Applications > ENVIRONMENTS > Service Scaling Groups > <Service Scaling Group Name> ).
  2. To view SSG traffic data, from the ANALYTICS menu on the left:
    1. Click Throughput to view the average throughput data for all SSG devices.
    2. Click HTTP to view HTTP transaction data for all SSG devices.
  3. Using the tabs to the right of the chart, expand the Dimension pane and expand the BIG-IP Host Names dimension to display the SSG devices in the object list.
    You can filter chart data by selecting one or more devices from the object list.
  4. From the ANALYTICS menu on the left, select Dropped or Errors to ensure that the average rate of throughput errors and drops are as expected.
  5. Click CONFIGURATION, click Devices from the menu to the left.
  6. Select one or more device rows, and click View Traffic Statistics to view additional device throughout data that applies only to the selected devices.

Detecting device health issues in a service scaling group

With the Analytics services to your service scaling groups, you can detect changes in device resource usage (for example, CPU, memory) and further identify the impact on the F5® BIG-IP ® VE devices and their connected applications.

Each service scaling group's health status indicates the current resource usage for all the BIG-IP VE devices within your service scaling group. When one, or more, devices cross a configured resource usage threshold, the entire service scaling group's health status is affected. These health issues can be mitigated to prevent performance impact on the traffic processing services to any of your connected applications.

Isolate a service scaling group with health issues

You can isolate service scaling groups (SSG) that are experiencing health issues in order to further isolate the BIG-IP VE devices with changes in resource usage.
  1. At the top of the screen, click Applications.
  2. On the left, click ENVIRONMENTS > Service Scaling Groups .
  3. Locate the HEALTH area at the top left of the screen, in the summary bar, and click a health status to filter the SSG list on the screen by that selection.
    The Health area displays the number of service scaling groups that are currently at each health status. Use this summary to identify which service scaling groups require additional analysis due to changes in performance thresholds. You can select a health status to filter the service scaling group list.
  4. At the top left of the screen, in the summary bar, click the SSGs WITH ACTIVE ALERTS area to filter the screen's list to display only service scaling groups that currently have active alerts.
    The SSGS WITH ACTIVE ALERTS area provides information about the SSGs with BIG-IP VE devices that have crossed one of the pre-configured SSG health thresholds. You can modify your alerts thresholds regarding the SSG devices' resource usage.
  5. To sort the screen's list by SSG CPU and memory usage:
    1. Locate the AVERAGE CPU USAGE and AVERAGE MEMORY USAGE areas at the top right of the screen, in the summary bar.
    2. Click one of these areas to sort the screen's list by that metric.
    These areas list the SSGs with the highest average resource usage values. You can use this area to quickly evaluate the resources of your SSGs.

Isolate a service scaling group device with health issues

Once you have isolated a service scaling group's health issues, you can isolate which of its BIG-IP device(s) are experiencing health issues due to resource usage.
  1. Open the single service scaling group screen by selecting the name of the SSG from the Service Scaling Groups Screen ( Applications > ENVIRONMENTS > Service Scaling Group > <Service Scaling Group Name> ).
  2. To view the list of service scaling group devices by health, click CONFIGURATION at the left, below the summary bar.
  3. In the CONFIGURATION area, select Devices from the menu to the left.
    This opens a chart that lists all the devices that are providing BIG-IP services to the service scaling group.
  4. Use the Health column to isolate devices with a Critical or Moderate health status. Use the Hostname and Device Address columns to isolate the affected devices.
    Tip: You can select one or more check boxes to the left of the device row, and click View Health Statistics to view the selected device's CPU, Memory, Disk Space, Disk Usage, and Interface Health statistics data. This will filter the displayed data by the selected device(s).
  5. You can adjust the BIG-IP device settings directly, by clicking the Device Address from the Devices list.
    This action opens the BIG-IP's environment.
  6. To identify the applications that are receiving the BIG-IP services, select Applications from the menu the left.
The health statistics data displayed allows you to monitor the status of a device with high resource usage.

Detecting device performance issues in a service scaling group

The F5® BIG-IP® VE devices within a service scaling group can individually, or collectively, experience performance issues. This can occur for a number of reasons, and impacts the performance of the application services provided by the BIG-IP VE devices within a service scaling group (SSG). In order to prevent or mitigate application performance issues, you can isolate specific devices by using alerts and system data for a selected service scaling group. In addition, you can monitor the applications that are managed by each service scaling group.

Isolate a service scaling group with performance issues

You can isolate service scaling groups (SSG) that are experiencing performance issues in order to further isolate the BIG-IP VE devices that may impact delivery of application services.
  1. At the top of the screen, click Applications.
  2. On the left, click ENVIRONMENTS > Service Scaling Groups .
  3. Sort the screen's list by SSG throughput in and throughput out.
    1. Locate the THROUGHPUT IN and THROUGHPUT OUT areas at the top right of the screen, in the summary bar.
    2. Click one of these areas to sort the screen's list by the selected metric.

    These areas list the SSGs with the highest throughput values (Bps). You can use this area to quickly evaluate the traffic processing capacity of your SSGs.

The service scaling group's screen helps you to evaluate the traffic data for the BIG-IP VE devices. You can use this screen to further identify trends in device traffic, and to establish whether the devices require prevention or mitigation measures.

Identify a service scaling group performance issue

You identify the device resource usage trends of a service scaling group (SSG) with traffic management issues in order to troubleshoot the BIG-IP device's impact on an application's performance.
  1. Open the single service scaling group screen by selecting the name of the SSG from the Service Scaling Groups Screen ( Applications > ENVIRONMENTS > Service Scaling Group > <Service Scaling Group Name> ).
  2. In the ANALYTICS area at the center of the screen you can monitor the health and traffic data of all devices in the service scaling group.
  3. To view SSG traffic management data select from the menu to the left inn the ANALYTICS area at the center of the screen:
    1. Click Throughput to view the average traffic throughput rate for all the SSG devices.
    2. Click Connections to view the average number of open connections for all the SSG devices.
    3. Click HTTP to view the average number HTTP transactions for all the SSG devices.
    4. Click Dropped to view the average number of dropped packets for all the SSG devices.
    5. Click Errors to view the average number of packets with throughput errors for all the SSG devices.
    Tip: Expand the chart view by collapsing the summary bar and/or application configuration map using the arrows to the right of these areas.
  4. To monitor the service scaling group's traffic over a period of time period, adjust the time setting above the chart.
    To analyze device data according to isolated alerts and events, ensure that the Events button is set to ON.
    Events and alerts appear in the chart as numbered icons that are color-coded according to Category.
  5. View the chart data to evaluate whether the triggered alert was due to a trend over time, or a sudden change in behavior.
  6. To filter alerts and events according to type, click the Category buttons below the chart to enable or disable displayed categories.
  7. To filter alerts and events according to severity, click the Log Level buttons below the chart to enable or disable by severity.
  8. Click the numbered icon in the chart time line to display a table with information about the events and alerts triggered at that time.
  9. Click the rows in the table to display details about each event or alert.
  10. Expand each dimension widget to view the dimension objects, and view detailed information about their metric data.
  11. To filter chart data by one or more BIG-IP devices in the service scaling group, select objects in the dimension BIG-IP Host Names.
  12. To isolate applications that might be affected by device traffic management issues, click CONFIGURATION at the center of the screen and click Applications from the menu at the left.
    This area displays the applications that receive services from the SSG.

Isolate a service scaling group device with performance issues

Isolate a service scaling group experiencing health issues using the service scaling groups list ( Applications > ENVIRONMENTS > Service Scaling Group : Analytics).
Once you have isolated a service scaling group's traffic management performance issue, you can isolate which of its BIG-IP device(s) are responsible for the issue, based on the device health.
Note: In order to receive traffic performance alerts, ensure that traffic throughput alerts are enabled for your service scaling group's alert rules.
  1. Open the single service scaling group screen by selecting the name of the SSG from the Service Scaling Groups Screen ( Applications > ENVIRONMENTS > Service Scaling Group > <Service Scaling Group Name> ).
  2. In the service scaling group screen click CONFIGURATION at the center of the screen.
  3. In the CONFIGURATION area, select Devices from the menu to the left.
    This opens a chart that lists all the devices that are providing BIG-IP services to the service scaling group.
  4. Use the Health column to isolate devices with a Critical or Moderate health status. Use the Hostname and Device Address columns to isolate the affected devices.
    Tip: You can select one or more check boxes to the left of the device row, and click View Traffic Statistics to view Device Traffic and Interface Traffic data, based on your selection.
  5. To identify the applications that are receiving the BIG-IP services, select Applications from the menu to the left.

Device resource and performance charts

The following describes the charts found in the single service scaling group screen ( Applications > ENVIRONMENTS > Service Scaling Groups > <Service Scaling Group Name> ), in the Analytics area. These charts display the trends of a service scaling group's BIG-IP VE devices. Each chart displays an aspect of the devices as a function of the selected time period.

Chart Menu Title Chart Title Description
CPU Usage CPU Usage The average percent CPU usage for all cores and BIG-IP devices by the activity categories.

Metric Unit:

Percent

Legend:

User: The average percentage of CPU usage for the all the BIG-IP user space programs over a given time period.

System: The average percentage of CPU usage for all the running BIG-IP systems over a given time period

I/O Wait: The percentage of time (during the selected time period) that a given CPU is idle for an I/O wait operation. This occurs when at least one outstanding I/O disk operation is requested by a task scheduled on system CPU.

Stolen: The percentage of time a virtual CPU waits for real CPU when the hypervisor is servicing another virtual machine.

Top Cores Top 6 CPU Cores The six, most active CPU cores for all monitored BIG-IP devices. This isolates the cores that are consuming the most CPU resources, of all the device CPUs.

Metric Unit:

Percent

Legend:

CPU core

Memory Memory Usage The percent RAM used by system processes of the monitored BIG-IP devices.

Metric Unit:

Percent

Legend:

TMM: The average percentage RAM used by device TMM processes.

Total: The average percentage of RAM used by all devices

Other: The average percentage of used RAM from non-TMM processes.

Throughput Throughput Bytes The average rate of traffic (in bytes) processed by the BIG-IP device interfaces.

Metric Unit:

Average/s

Legend:

In: The average rate of incoming traffic to the BIG-IP devices.

Out: The average rate of outgoing traffic from the BIG-IP devices.

Connections Concurrent Connections The average number of connections that are open at the same time, either on the client-side and on the server-side.

Metric Unit:

Count

Legend:

Client Side: The average number of concurrent connections at the client side.

Server Side: The average number of concurrent connections at the server side.

HTTP HTTP Transactions The transaction includes all HTTP request and response messages passed between the client, BIG-IP system, and server.

Metric Unit:

Average/s

Legend:

Transactions: Average number of HTTP transactions per second that were processed by the BIG-IP devices.

Dropped Throughput Drops The average rate of packets per second (pps) that were dropped by the BIG-IP device interfaces or discarded by the TMM over the course of the transaction.

Metric Unit:

Average/s

Legend:

In: The average rate of packets per second that were dropped by the BIG-IP interface.

Out: The average rate of packets per second that were accepted by the BIG-IP interface, but discarded by the TMM.

Errors Throughput Errors The average rate packets per second (pps) that were corrupted or arrived incomplete over the course of the transaction across the network

Metric Unit:

Average/s

Legend:

In: The average packets per second received as throughput error.

Out: The average packets per second transmitted out at throughput error.

Managing device monitoring settings for a service scaling group

The health of your service scaling group (SSG) is determined by the health of its F5® BIG-IP® VE devices.

Each BIG-IP VE device in a SSG is monitored by set of configurable device health alert rules that include performance metrics and their corresponding thresholds. You can adjust the alert rules for an SSG to define the health rules for its devices.

The SSG health score reflects the device metric that crossed the most severe threshold. This means that if a device metric violated a warning or critical threshold, the SSG health status becomes moderate or critical, respectively. You receive a device and SSG alert when a device alert rule violation is sustained for more than five minutes.

About device health alert rules

Device health alert rules include the metrics and corresponding thresholds that define the health status of your BIG-IP devices. You can select which metrics are included, and adjust the warning and critical threshold values.

A metric threshold violation must be sustained for 5 minutes to trigger an alert. A subsequent alert is triggered once another threshold is crossed (either an increase or decrease in severity, or cleared). To ensure that conditions are improving, an alert for declining severity (critical to warning), or an alert that has been cleared, is triggered only when the value is sustained for five minutes at ten percent below the threshold value. For example, if a threshold value is configured for greater than 60 percent, a declining severity must be sustained at 54 percent or less to trigger an alert.

Modify service scaling group device resources alerts

Before you start, you must have created a service scaling group to manage your applications, and configured the metrics at which you want BIG-IQ Centralized Management to trigger BIG-IP VE device health alerts.
You can adjust the metrics and corresponding thresholds that define a service scaling group's BIG-IP VE device health status. The device health status triggers alerts that indicate changes in device resource usage, which can affect the device's performance.
Note: Modifications to alert rules will clear any active alerts that correspond to the changes, and so automatically trigger an Alert Rule Change alert.
  1. At the top of the screen, click Applications.
  2. On the left, click ENVIRONMENTS > Service Scaling Groups .
  3. Select the name of the service scaling group.
    This opens the service scaling group's screen.
  4. At the center of the screen, click CONFIGURATION, or click the health icon located at the far left of the summary bar
    You are now in the Service Scaling Group Properties area.
  5. Scroll down to the Health Status Rules area.
  6. Use the Critical field to adjust the previously configured metrics and defined unit thresholds that trigger a critical health alert notification.
    You can choose from six different metrics with defined thresholds that trigger a critical health notification. Your service scaling group's health is critical when there are one or more critical thresholds violations.
  7. In the Moderate field, you can adjust the previously configured metrics and defined unit thresholds that trigger a moderate health alert notification.
    You can choose from six different metrics with defined thresholds that trigger a moderate health notification. Your service scaling groups health is moderate when there are one or more warning threshold violations, but no critical thresholds violations.
  8. To save your changes, click Save at the bottom of the screen.
    You can click Save & Close to return to the Service Scaling Groups screen.
The health status for service scaling group's devices are now defined by the new alert rules you configured. Once a device metric crosses a defined threshold, you receive a health alert for that device, and for the service scaling group. A critical or moderate health status for one or more device impacts the health status of the entire service scaling group.

Service scaling group health alerts

The service scaling group (SSG) health alerts notify you of the performance status of the SSG BIG-IP® devices. This table describes service scaling group health alert.

Alert Description Indication Default Thresholds Action (if applicable)
SSG Health There has been a change in the health status of one or more BIG-IP devices in your SSG. One or more BIG-IP VE devices in your SSG has a sustained change in health status, which is based upon performance of device resources and/or throughput. For SSG Devices: Customized per service scaling group. A critical health status of your SSG can lead to a scale out recommendation. You can monitor the health of affected devices using device health alerts.

Device health alerts

The device health alerts notify you of changes in device resource and throughput metric thresholds for your BIG-IP® devices.

Alert Description Indication Default Thresholds Action (if applicable)
Device Health There has been a change in one or more of the of BIG-IP device health rule metrics. One or more of the device resources and/or throughput measurements crossed a defined threshold, which may impact your BIG-IP VE device's performance. For SSG devices: Customized per service scaling group For SSG Devices: A critical health status of your BIG-IP VE device affects the health of the SSG. Investigate the active alerts for device metrics.

Device alerts

The device alerts notify you of changes in a BIG-IP ® device resource and performance metrics. These alerts are found in the single service scaling group screen ( Applications > ENVIRONMENTS > Service Scaling Groups > <Service Scaling Group Name> ), or in the Alert History and Active Alerts screens ( Applications > ALERT MANAGEMENT ).

Alert Description Default Thresholds Action (if applicable)
Device CPU The average CPU utilization for a BIG-IP device. Critical > 80%

Warning > 60%

Cleared < 60%

Investigate affected BIG-IP device resources.
Device Memory The average memory (RAM) utilization for a BIG-IP device Critical > 80%

Warning > 60%

Cleared < 60%

Investigate affected BIG-IP device resources.
Device Throughput In The average throughput (Mbps) of incoming traffic to a BIG-IP device. Critical > 8Mbps

Warning > 6 Mbps

Cleared < 6 Mbps

Investigate affected BIG-IP device throughput
Device Throughput Out The average throughput (Mbps) of outgoing traffic from a BIG-IP device. Critical > 8Mbps

Warning > 6 Mbps

Cleared < 6 Mbps

Investigate affected BIG-IP device throughput
ASM Memory The average device memory (RAM) used for Web Application Security services. Critical > 80%

Warning > 60%

Cleared < 60%

Investigate affected BIG-IP device's configuration for ASM memory.
ASM Bypass Ratio The average rate of transactions that bypassed Web Application Security services. Critical > 0.05%

Warning > 0.01%

Cleared < 0.01%

Investigate affected BIG-IP device's system resource configuration for ASM processes.