Manual Chapter : Monitor application performance using alerts

Applies To:

Show Versions Show Versions

BIG-IQ Centralized Management

  • 8.0.0
Manual Chapter

Monitor application performance using alerts

About application alerts

Alerts provide the status application services contained in an application can indicate current, or potential, performance issues that require mitigation. The following outline the types of alerts provided to your application services

Health Alerts and Status

Application health is a reflection of the status of one or more of its application services. Application health is displays the status of application service with the most critical health. Once you select an application, you can view which application service(s) is affected.
Application service health is displayed based on the most critical status detected. There are two factors used to indicate changes in health:
  • The status of the virtual server connected to the application service. The health status of a virtual server is reflected by its communication with the server pool to your managed application. For more information about these alerts, see Virtual server status events, Virtual server status events, and Pool member status events.
  • A performance metric threshold is crossed and sustained over time. These are based on the metric conditions configured to your application service's alert rules.

Traffic Performance Alerts

Performance alerts are triggered by threshold violations, reported by application services. Active alerts reflect an ongoing application service performance issue. You can edit these thresholds, based on your system monitoring requirements.
A critical, or warning alert, that is active, indicates there has been a sustained threshold violation for over five minutes. A subsequent alert is triggered once another threshold is crossed (either an increase or decrease in severity, or cleared). To ensure that conditions are improving, an alert for declining severity (critical to warning) or a cleared alert, is triggered only when the value is sustained for five minutes at ten percent below the threshold value. For example, if a threshold value is configured for greater than 60 percent, a declining severity must be sustained at 54 percent or less to trigger an alert.

Security Alerts

Security alerts are provided to applications and virtual servers with Web Application Security protection. These alerts vary based on the protection mode (blocking vs transparent). Security alerts differ from other alerts, as they are based on set thresholds, and they do not directly impact application health.

Identify application services with health issues

You can use the application health status settings to identify a specific application service that has surpassed a performance threshold.
  1. Open the application summary screen (
    Applications
    APPLICATIONS
    ).
  2. Locate the HEALTH area at the top left of the summary bar.
  3. Click
    Critical
    and
    Moderate
    to filter the application list by applications with the selected health statuses.
    In the applications list, the
    Active Alerts
    field indicates the current number of alerts for an application.
  4. Click the application's name.
    The application services are displayed.
  5. Identify the application services with moderate or critical health, and click the name to view the summary dashboard for the selected application service.
You can use the ANALYTICS portion of the application service dashboard to evaluate charts that contain traffic data specific to the selected application service.

Identify application service performance issues

You can use the application service alerts to identify when a performance issue began, and its current status.
  1. Open the application properties screen by selecting the application's name from the Applications screen ( click
    Applications
    APPLICATIONS
    <Application Name>
    <Application Service>
    ).
  2. To view the application service's most severe, active alerts, view the Active Alerts area at the far right of the summary bar.
  3. Click
    See All
    to view the application's Active Alert screen.
    This displays a log of all alerts that have crossed a defined threshold.
  4. Click the row for the most recent health alert, and view the alert details on the lower part of the screen.
    • The
      Description
      field displays the affected performance indicator.
    • The
      Value
      field displays the value when the alert was triggered
    • The
      Log Level
      filed indicate the alert's severity.
  5. Return to the single application screen by clicking the back arrow next to Active Alerts screen title.
From the application service's screen, you can use the alert information to analyze and isolate if the issue is related to one or more pool members.

Virtual server status events

Virtual Server events and indicate the status of the virtual server and its pool. You can see these events and alerts in the charts of the application properties screen (
Applications
APPLICATIONS
<Application Name>
<Application HTTP Service>
) and in the Local Traffic dashboards (
Monitoring
DASHBOARDS
Local Traffic
). You can also view alerts in the Active Alerts and Alert History screens (
Applications
ALERT MANAGEMENT
).
Alert
Description
Default Thresholds
Impact
Virtual Server is Offline
The virtual server is offline as a result of status or configuration changes. The system then updates the virtual server status with one of the following messages:
  • Online- Virtual server is online.
  • Disabled- Virtual server was disabled.
  • Monitor disabled- The virtual server monitor, was disabled.
    The virtual server monitor is configured on the BIG-IP system.
  • Virtual server deleted- Virtual server was deleted.
Critical:
Offline
Prolonged issues that impact application pool member performance require either virtual server mitigation, or pool member configuration mitigation.
Virtual server health
The pool response to the virtual server. Pool status is based on the pool member response.
Critical:
All pool members in a pool are unresponsive
Moderate:
At least one, but not all, members in a pool are unresponsive
Cleared:
All pool members are back online, or the virtual server was deleted.

Wide IP status events

Wide IP events and indicate the status of your DNS pools of virtual servers, and the overall health of your DNS application services. You can see these events and alerts in the charts of the application properties screen (
Applications
APPLICATIONS
<Application Name>
<DNS Application Service>
) and in the DNS dashboards (
Monitoring
DASHBOARDS
DNS
GSLB
). You can also view alerts in the Active Alerts and Alert History screens (
Applications
ALERT MANAGEMENT
).
Alert
Description
Default Thresholds
Impact
Wide IP health
The Wide IP is either offline, or all pool members (virtual servers) assigned to the wide IP's pool are disabled.
Critical:
All pool members in a pool are unresponsive
Cleared:
At least one pool member is back online, or the wide IP was deleted.
Prolonged issues may impact that impact DNS load balancing, and requires evaluation of the wide IP's configuration.

Pool member status events

Pool member events and alerts indicate the status of a pool member. You can see these events and alerts in the charts of the application properties screen (
Applications
APPLICATIONS
<Application Name>
<Application HTTP Service>
) and in the Local Traffic dashboards (
Monitoring
DASHBOARDS
Local Traffic
). You can also view alerts in the Active Alerts and Alert History screens (
Applications
ALERT MANAGEMENT
).
Alert
Description
Indication
Default Thresholds
Impact
Pool Member Offline
The pool member (server) is offline as a result of status or configuration changes. The system then updates the pool member status with one of the following messages:
  • Online- The pool member is back online.
  • Disabled- The pool member is disabled.
  • Pool monitor disabled- The pool member monitor, which is configured on the BIG-IP system, is disabled.
  • Pool member deleted- The pool member has been deleted from the pool's configuration.
Pool member issues can lead to increases in application response time, server-side round trip time (RTT), incomplete transactions, and server errors.
Critical:
Offline
Prolonged impact on application performance might require the addition of a new pool member.
Pool health
The pool member response to the server.
Critical:
All pool members in a pool are unresponsive
Moderate:
At least one, but not all, members in a pool are unresponsive
Cleared:
All pool members are back online, or the virtual server was deleted.

HTTP Application service alerts

HTTP application service alerts notify regarding changes in metrics that can affect the overall performance of traffic or security management for your managed application. When one or more of these thresholds are crossed, the health of your HTTP application service will change. You can view alerts from the single application service's screen (
Applications
APPLICATIONS
<Application Name>
<HTTP application service>
), or the alerts screens (
Applications
ALERT MANAGEMENT
Active Alerts
or
Alert History
).
The following chart outlines the metric conditions for monitoring HTTP application services.
*Indicates that this data was collected from TCP traffic information, and indicates network latency and transmission times. Mitigation may require changes to your TCP profile.
**The Additional Data column refers to the ANALYTICS portion of the single application service's screen. The charts sited are located when either
APPLICATION SERVICE
or
Traffic Management
services are selected, unless stated otherwise (see image for reference).
Metric Conditions
Alert
Description
Impact
Default Thresholds
Additional Data**
Application Response Time
The average time from when the server receives the request from the BIG-IP system until the server sends the response. This metric is a reflection of the server's activities, as it deducts network latency and transmission time.
Increased server response latency can negatively impact the user's experience in accessing the application's contents.
No default
Select the
Application Response Time
chart from the menu to the bottom left. Use the dimensions to the right of the chart to identify if the issue is found on specific virtual servers or pool members.
Server Side RTT*
The average round trip time (RTT) for network communication between the BIG-IP system and the application server.
Increased latency over time can indicate a variety of issues including: server defects, bandwidth outage, or BIG-IP device issues.
Critical > 50ms
Warning > 20ms
Cleared < 20ms
Select the
Server Side RTT
or
Client Side RTT
chart from the menu to the bottom left. Use the dimensions to the right of the chart to identify if the issue is found on specific BIG-IP devices or virtual servers.
Client Side RTT*
The average round trip time (RTT) for network communication between the BIG-IP system and the client application request.
No default
Incomplete Transactions
The percent of transactions, out of all transactions, that did not complete the request and response exchange.
A higher percentage of unresolved transactions can indicate a number of issues that negatively impact a user's connection. Increased incomplete transactions can result from a query timeout, or an unknown cancellation.
Critical > 5%
Warning > 1%
Cleared < 1%
Select the
Transactions
chart from the menu to the bottom left. Use the dimensions to the right of the chart to identify if the issue is found on specific BIG-IP devices or virtual servers.
Request Errors
The average rate of transactions that returned a request error response code (4XX) out of all the overall transactions.
Increased 4XX errors indicate issues with client-side access, with broken links as the most common error.
No default
Select the
Response Codes
chart from the menu to the bottom left. Use the dimensions to the right chart to filter specific response codes and URLs.
Server Errors
The average rate of transactions that returned a server error response code (5XX) out of all the overall transactions.
Increased 5XX errors indicate issues with the application server.
Critical > 0.05%
Warning > 0.01%
Cleared < 0.01%
Select the
Response Codes
chart from the menu to the bottom left. Use the dimensions to the right chart to filter specific response codes and pool members.
High TPS
The number of server transactions per second (TPS) is higher than the expected average.
The rate of application activity is higher than expected and may limit the application server's resources. This may also indicate an attack.
No default
Select
SERVER
services to view top pool member charts. Select the
TPS
chart and use the dimensions to the right of the chart to filter specific virtual servers and pool members.
Low TPS
The number of server transactions per second (TPS) is lower than the expected average.
The rate of application activity is lower than expected. This may indicate that your application servers resources are limited.
No default
Client Side Throughput In*
The average volume (in Mbps) of traffic sent from BIG-IP to the client.
Sudden increases in traffic volume can lead to a variety of issues that can affect the application's performance. When throughput exceeds a certain value you can inspect for: server defects, bandwidth outage, DoS attack signatures or BIG-IP device resource limitations.
No default
Select
CLIENT
services to view client side transaction charts. Select the
Client Side Throughput
and use the dimensions to the right of the chart to filter specific BIG-IP devices or virtual servers.
Client Side Throughput Out*
The average volume (in Mbps) of traffic sent from the client to BIG-IP.
No default
Server Side Throughput In*
The average volume (in Mbps) of traffic sent from BIG-IP to the server.
No default
Select
SERVER
services to view top pool member charts. Select the
Server Side Throughput
and use the dimensions to the right of the chart to filter specific virtual servers and pool members.
Server Side Throughput Out*
The average volume (in Mbps) of traffic sent from the server to BIG-IP.
No default
Client Side Goodput Received*
The volume (in Mbps) of useful, uncorrupted packets received by the client from BIG-IP is lower than expected.
Lowered goodput indicates suboptimal flow control and congestion avoidance over the transport layer. A lower ratio of goodput to maximum throughput can indicate a number of issues with the network including, an increase in TCP slow start or congestion control, packet loss and network interference.
No default
Select
CLIENT
services to view client side transaction charts. Select the
Client Side Goodput
chart, and use the dimensions to the right of the chart to filter specific virtual servers and BIG-IP devices.
Client Side Goodput Sent*
The volume (in Mbps) of useful, uncorrupted packets sent from the client to BIG-IP is lower than expected.
No default
Server Side Goodput Received*
The volume (in Mbps) of useful, uncorrupted packets received by the server from BIG-IP is lower than expected.
No default
Select
SERVER
services to view top pool member charts. Select the
Server Side Goodput
chart, and use the dimensions to the right of the chart to filter specific virtual servers and pool members.
Server Side Goodput Sent*
The volume (in Mbps) of useful, uncorrupted packets sent from the server to BIG-IP is lower than expected.
No default

TCP application service alerts

TCP application service alerts notify you when there are changes in metrics that can affect the overall performance of traffic over the network. There are no default alert thresholds for TCP application services, so you must configure your threshold values to receive alerts. If configured, you can view alerts from the application service's screen (
Applications
APPLICATIONS
<Application Name>
<TCP application service>
), or the general alerts screens (
Applications
ALERT MANAGEMENT
Active Alerts
or
Alert History
).
**The Additional Data column refers to the ANALYTICS portion of the single application service's screen. The charts sited are located when either
APPLICATION SERVICE
or
Traffic Management
services are selected, unless stated otherwise (see image for reference).
Metric Conditions
Alert
Description
Impact
Additional Data**
Server Side RTT
The communication time (in ms) from a SYN to an ACK message between the server and BIG-IP.
Increased latency over time can indicate a variety of issues including: server defects, bandwidth outage, or BIG-IP device issues.
Select the
Server Side RTT
or
Client Side RTT
in your application service's ANALYTICS area. Use the dimensions to the right of the chart to identify if the issue is found on specific BIG-IP devices or virtual servers.
Client Side RTT
The communication time (in ms) from a SYN to an ACK message between the client and BIG-IP.
Client Side Throughput In
The average volume (in Mbps) of traffic sent from BIG-IP to the client.
High throughput can be due to increased application usage, or a DoS attack on the application server. Based on your network resources, higher throughput can lead to increased throughput latency.
Select the
Throughput Bytes (average/sec)
chart to view when the throughput increase occurred, and if the increase affected a specific part of the transaction. Use the dimensions to the right of the chart to identify if the issue is found on specific BIG-IP devices or virtual servers.
Client Side Throughput Out
The average volume (in Mbps) of traffic sent from the client to BIG-IP.
Server Side Throughput In
The average volume (in Mbps) of traffic sent from BIG-IP to the server.
Server Side Throughput Out
The average volume (in Mbps) of traffic sent from the server to BIG-IP.
Client Side Goodput Received*
The volume (in Mbps) of useful, uncorrupted packets received by the client from BIG-IP is lower than expected.
A lower ratio of goodput to maximum throughput can indicate a number of issues with the network including, an increase in incomplete transactions, packet loss and network interference.
Select
CLIENT
services to view client side transaction charts. Select the
Client Side Goodput
chart, and use the dimensions to the right of the chart to filter specific virtual servers and BIG-IP devices.
Client Side Goodput Sent*
The volume (in Mbps) of useful, uncorrupted packets sent from the client to BIG-IP is lower than expected.
Server Side Goodput Received*
The volume (in Mbps) of useful, uncorrupted packets received by the server from BIG-IP is lower than expected.
Select
SERVER
services to view top pool member charts. Select the
Server Side Goodput
chart, and use the dimensions to the right of the chart to filter specific virtual servers and pool members.
Server Side Goodput Sent*
The volume (in Mbps) of useful, uncorrupted packets sent from the server to BIG-IP is lower than expected.

Web Application Security Alerts

Security alerts in the TRENDS AND IMPACTS area of the L7 Security dashboard (
Monitoring
DASHBOARDS
L7 Security
) notify you of the number of objects reporting Web Application Security policy (Web Exploits) or DoS profile (L7 DDoS Attacks) events over the past day (trend charts report the past week). These alerts indicate that a protected object (application or virtual server) recently experienced an increased rate in performance issues. To view data the corresponds with these traffic events go to
Monitoring
DASHBOARDS
DDoS
HTTP Analysis
To view the status of your deployed applications, go to
Applications
APPLICATIONS
.
Alert
Description
Impact
Default Thresholds
Action (if applicable)
BAD TRAFFIC TRENDS
The number of objects with a significant increase in traffic with any violation rating.
Increase in transactions with any violation rating.
Web Exploits: The average number of transactions with a violation rating exceeded 10% in the past 24 hours and increased by a ratio of 0.1% out of all traffic over the past week.
L7 DDoS Attacks: The average volume of active, simultaneous attacks increased in the past 24 hours.
Investigate transactions and fine tune your security policy/profile for new threats.
POTENTIALLY HARMFUL ATTACKS
The number of objects with a transparent protection mode (Monitoring), that have an increase in bad traffic.
Increase in transactions with high violation rating.
Web Exploits: The rate of transactions with violation rating of 4 or 5 exceeded 0.1% in the past 24 hours.
L7 DDoS Attacks: The volume of simultaneous active attacks increased in the past 24 hours.
Change security policy or profile to Blocking mode.
FALSE POSITIVE ATTACKS
The number of objects with a blocking protection mode that have an increase in blocked traffic with a low violation rating.
Increase in blocked transactions.
Web Exploits: The rate of blocked transactions with a violation rating of 1 or 2 exceeded 0.01% over the past 24 hours.
Investigate blocked transactions and fine-tune your Web Application Security policy to allow valid transactions.
BLOCKED ATTACKS
The number of objects with a blocking protection mode that blocked any bad traffic over the past 24 hours.
N/A
N/A
N/A

DNS application service alerts

DNS application service alerts notify you when there are changes in metrics that can affect the overall performance of traffic over the network. When one or more of these thresholds are crossed, the health of your DNS application service will change. You can view alerts from the application service screen (
Applications
APPLICATIONS
<Application Name>
<DNS application service>
), or the general alerts screens (
Applications
ALERT MANAGEMENT
Active Alerts
or
Alert History
).
Metric Conditions
Alert
Description
Indication
Default Thresholds
Additional Data
RPS Dropped/Requests
The ratio of dropped requests, out of the total number of requests.
The request packet to a server pool member (virtual server) is dropped. This may be a configured, alternate load balancing method, or the pool member is unavailable.
Critical > 50%
Warning > 35%
Cleared < 35%
Select the
DNS RPS
in your application service's ANALYTICS area, to view the dropped requests over time. Use the dimensions to the right of the chart to identify if the issue is found on specific BIG-IP devices or DNS sync groups.
LBS Alternate/Requests
The ratio of alternate load balancing decisions applied to requests, out of the total number of requests.
Requests packets are sent to an alternate IP address (for the requested host) due to a client side timeout.
Critical > 65%
Warning > 90%
Cleared < 65%
Select the
DNS Load Balancing Decisions
in your application service's ANALYTICS area, to view trends in load balancing methods over time. Use the dimensions to the right of the chart to identify if the issue is found on specific BIG-IP devices or DNS sync groups.
LBS Fallback/Requests
The ratio of fallback load balancing decisions applied to requests, out of the total number of requests.
One or more servers is experiencing an outage, and request packets are directed to a failover server.
Critical > 50%
Warning > 25%
Cleared < 35%