Manual Chapter : Configure Data Collection Devices

Applies To:

Show Versions Show Versions

BIG-IQ Centralized Management

  • 8.2.0, 8.1.0
Manual Chapter

Configure Data Collection Devices

Data Collection Device configuration overview

You configure the Data Collection Devices (DCD)s for your BIG-IQ solution using the BIG-IQ CM. The BIG-IQ controls the settings for everything except the zone assignment and master key settings.

Statistics retention policy overview

Statistics retention impacts quantity of data reported by your managed BIG-IP devices. When you choose how much data to retain, you need to consider how much disk space you have available. The controls for data retention require an understanding of how much space you have to store data, and which data needs additional storage.
The fields on the Statistics Retention Policy screen set the size of the indices that BIG-IQ uses to store raw data. These fields all work in a similar fashion. One way to understand how these indices work is to think of your data storage space as a set of containers. The values you specify on this screen determine how much storage space each container (index) consumes. Because data is saved for the time periods you specify, the longer the time period that you specify, the more space you consume. The disk storage that is consumed depends on several factors.
  • The number of BIG-IP devices for which you are collecting data
  • The number of objects those BIG-IP devices have (for example, virtual servers, pools, pool members, and iRules)
  • The frequency of data collection
  • The data retention policy
  • The data replication policy
  • Additional data storage for prioritized service groups
It is important to note, that the system has default global retention settings, and fine-tuning the statistics retention policy is not required. Changing the retention settings can improve the efficiency of disk usage, or quality of statistics data retained, depending on how long you retain certain data after its initial collection.
The following are key concepts to understand about how the retention policy works.
How long is data in each container retained?
Data is retained in each container for the time period you specify. When the specified level is reached, the oldest chunk of data is deleted. For example, if you specify a raw data value of 48 hours of retained collected data, when 48 hours of raw data has accumulated, the next hour of incoming raw data triggers the BIG-IQ to delete the oldest hour of collected data.
When does data from one container transfer to the next?
The BIG-IQ transfers data from one container to the next in increments that are the size of the next (larger) container. That is, every 60 minutes, BIG-IQ aggregates the last 60 minutes of collected raw data into a data set and passed to the
Hour(s)
container. Every 24 hours, BIG-IQ aggregates the last 24 hours of hourly data into a data set and passed to the
Day(s)
container, and so on for the
Month(s)
container.
What about limits?
Limit Max Storage to
specifies the percentage of total disk space that you want data to consume on the DCDs in your cluster.
If more disk space is consumed than the percentage you specified, BIG-IQ takes two actions to prevent data corruption when storage is completely exhausted
  1. The DCDs do not collect new data until the available disk space complies with the
    Limit max storage to
    setting.
  2. Statistical data not required to calculate the next higher time layer is removed (for example, you need 60 minutes of raw data to aggregate to the Hours level). Data is removed starting with the raw data container, then the hourly data container, then the daily time container. This process stops when storage consumption is below the
    Limit max storage to
    setting.
Global vs. Group properties
Global properties is applied for all statistics data retention. You can adjust retention settings for specific service groups that are activated on your system DCDs:
Global Properties
Global properties provide the statistics retention settings for all statistics collection for your BIG-IQ. These are the default settings for all service modules activated on your DCD.
Group Properties
Service groups can have statistics retention settings that differ from the default global properties. You can use these settings to fine-tune retention such that more, or less, data is retained for an activated service. This can improve the efficiency of how data is retained in your system. A service group will apply only specified settings and apply the global values for settings that are blank (e.g.
Keep monthly data up to
is left blank).

Manage the retention policy for your statistics data

Before you can set the statistics retention policy, you must have added a data collection device (DCD). If you are adding statistics retention for a specific service group, ensure that the service is activated on the DCD.
You can manage the default settings that determine how your statistics data is retained, based on quality. The highest quality data is real-time (raw) data, (data that has not been averaged), but that consumes a lot of disk space, so you need to consider your needs in choosing your data retention settings.
  1. From BIG-IQ, at the top of the screen, click
    System
    , then, on the left, click
    BIG-IQ DATA COLLECTION
    and then select
    BIG-IQ Data Collection Cluster
    .
    The BIG-IQ Data Collection Cluster screen opens. On this screen, you can view summary status for the Data Collection Device (DCD) cluster and access the screens that you can use to configure the DCD cluster.
    • Under
      SUMMARY
      , you can access screens detailing how much data is stored, as well as how the data is stored.
    • Under
      CONFIGURATION
      , you can access the screens that control DCD cluster performance.
  2. Under the screen name, click
    CONFIGURATION
    Statistics Data Collection
    .
    The Statistics Collection Status screen opens.
  3. Click the
    Configure Retention
    button.
    The Statistics Retention Policy screen opens.
  4. Go to
    Global Properties
    to edit default retention:
    1. In the
      Keep real-time (raw) data up to
      field, type the number of hours of raw data to retain.
      You must specify a minimum of 1 hour, so that there is sufficient data to average and create a data point for the
      Keep hourly data up to
    2. In the
      Keep hourly data up to
      field, type the number of hourly data points to retain. field, type the number of hours of raw data to retain.
      You must specify a minimum of 24 hours, so that there is sufficient data to average and create a data point for the
      Keep daily data up to
      container.
    3. In the
      Keep daily data up to
      field, type the number of daily data points to retain.
      You must specify a minimum of 31 days, so that there is sufficient data to average and create a data point for the
      Keep monthly data up to
      container.
    4. In the
      Keep monthly data up to
      field, type the number of monthly data points to retain.
      Once the specified number of months passes, the oldest monthly data set is deleted.
    5. In the
      Limit max storage to
    6. In the
      Keep events up to
      field, type the number of days that you want keep events before the oldest events data set is deleted.
    7. n the
      Keep traffic capturing up to
      field, type the number of days that you want keep captured traffic before the oldest traffic data set is deleted.
    The Global Properties are the default retention settings for any service groups, or service group values, that are not populated in Group Properties.
  5. Go to
    Group Properties
    to add custom time retention settings for specific service groups.
    These settings will retain any Global Properties that are not specified in the Group Properties. Ensure that you have enough disk space to accommodate service group statistics retention.
  6. Expand Advanced Settings:
    The following configuration of data scaling and resilience using Elasticsearch. For more information, see
    General Elasticsearch FAQ
    .
    1. Select the
      Replicas
      check box to enable high availability for the stored data on your DCD cluster.
      Replicas
      are copies of a data sets available to the DCD cluster when one or more devices within that cluster become unavailable. By default, data replication for statistics is enabled. Disabling replication reduces the amount of disk space required for data retention. However, this provides no protection from data corruption that can occur when you remove a DCD. You should enable replicas to provide this protection.
    2. Select the
      Auto expand replicas
      check box to enable automatic duplication of the number of replicas for a specific data set.
      This allows the DCD cluster to dynamically host up to 2 separate replicas for a given data set, based on the number of DCDs available. This provides redundancy that protects from data loss even when more than one DCD becomes unavailable.
      This option is only available when
      Enable Replicas
      is selected. In addition, your system must include at least 3 DCDs (one primary and two replicas) with sufficient disk space.
  7. When you are satisfied with the values specified for data retention, click
    Save & Close
    .

Log index rotation policy overview

The optimum settings used to configure your Data Collection Device (DCD) indices depend on a number of key factors.
  • The system provides the ability to dynamically create new indices based on either a specified interval or a specified size. The primary goal to consider when you make these decisions is how to maintain a maximum disk allocation for the DCD data, while maintaining capacity for new data that flows in.
  • Secondary considerations include search optimization, and the ability to optimize old indices to reduce their size.
  • Generally, the best policy is one that does not create unnecessary indices. The more indices, the lower the overall performance, because your searches have to deal with more shards. For example, if you know a service has a low indexing volume (thousands/day) then it makes the most sense to have a large aggregation per rotation (5 days or 30 days). For services like Web Application Security that probably have high indexing volumes, it makes more sense to rotate every 8 hours (which reduces the number of retained indices).
  • Index rotation also allows new sharding and replica counts by changing the template on a given index type. New indices created from that template will contain the new shard and replica count properties.
This table shows the default configuration values for each index running on BIG-IQ Centralized Management. These values are based on anticipated data ingestion rates and typical usage patterns.
Service
Index Name
Minimum Number of DCDs
Rotation Policy
Retained Index Count
Approximate time window
Size of /var file system
Access
access-event-logs
2
Time/5 days
19
95 days
500 GB
Access
access-stats
2
Time/5 days
19
95 days
500 GB
Web Application Security
asmindex
2
Size/100000 MB
5
N/A
500 GB
FPS
websafe
2
Time/30 days
100
8 years
10 GB
If multiple services are running on a given DCD, or you have higher inbound data rates, you might have to adjust these values to keep the
/var
file system from filling up. (There is a default alert to warn of this when the file system becomes 80% full.)
The simplest resolution is to revise the retained index count; lowering this value reduces the disk space requirements, but it will also reduce the amount of data available for queries. For details about changing this setting, refer to the modifying indices topic for the service you are configuring.

How does the DCD aggregate raw data?

The DCD stores raw data coming from the BIG-IP devices in data indices. As data is received, it accumulates in the current index. When the accumulated data reaches the rotation threshold that you set, four things happen.
  • A new current index is created.
  • BIG-IP data begins accumulating in the new index.
  • The former current index becomes one of the retained indices.
  • If the total number of indexes is now larger than the retained index count, the oldest one is deleted.
When you set up index rotation, you determine what triggers the rotation threshold The Indices settings specify the characteristics of how the Data Collection Device manages your data..
The ideal configuration for data indices depends on the amount, frequency, and type of data your devices send to the DCD. The default settings are designed to satisfy most user scenarios, but you might want to explore the settings for the data types that you plan to send to the DCD, to make sure that those settings meet your needs.

Modify log indices

Before you can configure the indices for a data collection device, you must activate data collection for the services that you want to collect data for.
BIG-IQ stores incoming BIG-IP device data in indices on the Data Collection Devices (DCD) cluster. Each service that sends data uses it's own indices. You control how the BIG-IQ manages your data by adjusting the settings for the
Indices
for each service.
  1. From BIG-IQ, at the top of the screen, click
    System
    , then, on the left, click
    BIG-IQ DATA COLLECTION
    and then select
    BIG-IQ Data Collection Cluster
    .
    The BIG-IQ Data Collection Cluster screen opens. On this screen, you can view summary status for the Data Collection Device (DCD) cluster and access the screens that you can use to configure the DCD cluster.
    • Under
      SUMMARY
      , you can access screens detailing how much data is stored, as well as how the data is stored.
    • Under
      CONFIGURATION
      , you can access the screens that control DCD cluster performance.
  2. Under the screen name, click
    CONFIGURATION
    Logging Data Collection
    .
    The Settings screen opens.
  3. Click the
    Configure
    button for the service that you want to set up.
    BIG-IQ displays the indices settings for the selected service.
  4. Perform the next two steps for each index.
    If you are configuring the Access service, use the same indices values for the
    access-event-logs
    and
    access stats
    to avoid a mismatch in the reports generated from your logging data.
  5. Specify the
    Rotation Type
    .
    • To chunk your data based on the amount of data:
      1. Select
        Size Based
      2. For the
        Max Index Size
        , type the size of the indexes you want to create.
      For example, if you type
      1000
      , when the index size reaches 1 Gb, it becomes a retained index and new data from your BIG-IP begins accumulating in a new current index. If your
      Retained Index Count
      is set to 10, then the maximum disk space used by these indexes will be approximately 10 Gb.
    • To chunk your data based on the increments of time:
      1. Select Time Based
      2. For the
        Rotation Period
        , specify a time unit, and type how many of those units you want to comprise indexes you want to create.
      For example, if you type
      .5
      and select
      Hours
      , a new index is created every half hour. If your
      Retained Index Count
      is set to 10, then each retained index will contain approximately 5 hours of data.
  6. For the
    Retained Index Count
    , type the total number of indices you want to store on the DCD.
    This setting determines the maximum amount of data stored on the DCD. When this limit is reached, the oldest data is truncated or discarded. For example, if you set the number of indices to 10 and each index is 1 Gb, then you must have 10 Gb of storage available on your DCD.
  7. Click
    Save & Close
    to save the indices configuration settings.

Change the minimum number of master eligible devices

You can manage the minimum number of devices that must be available for the cluster to be considered operational. If the number of available devices is less than the value specified for the Minimum Master Eligible Devices, the cluster is deemed unhealthy.
  1. From BIG-IQ, at the top of the screen, click
    System
    , then, on the left, click
    BIG-IQ DATA COLLECTION
    and then select
    BIG-IQ Data Collection Cluster
    .
    The BIG-IQ Data Collection Cluster screen opens. On this screen, you can view summary status for the Data Collection Device (DCD) cluster and access the screens that you can use to configure the DCD cluster.
    • Under
      SUMMARY
      , you can access screens detailing how much data is stored, as well as how the data is stored.
    • Under
      CONFIGURATION
      , you can access the screens that control DCD cluster performance.
  2. Under the screen name, click
    CONFIGURATION
    Cluster Settings
    .
    The Cluster Settings screen opens.
  3. To change this setting, click
    Override
    .
    The button text changes to
    Update
    .
  4. In the
    Minimum Master Eligible Devices
    field, type or select the new minimum number of healthy devices for this DCD cluster, and click
    Update
    .
    The system updates the setting.
  5. When you are satisfied with the minimum number of devices setting, click
    Cancel
    to close the screen.

How do Data Collection Device zones work?

There are two ways to use Data Collection Device (DCD) zones to control how data is stored for your managed BIG-IP devices.
  • You can use zones to optimize statistics traffic routing. By assigning DCDs to a zone and then assigning managed BIG-IP devices to that zone, you control which DCDs collect statistic traffic for each device.
  • DCD zone awareness factors into how the DCD cluster performs during Disaster Recovery scenarios. The role zones play in these scenarios is discussed in the Disaster Recovery Best Practices article on
    support.f5.com
    .
To specify which DCDs collect statistics traffic for a BIG-IP device, you perform two tasks:
  • Log in to each DCD that should collect data for this BIG-IP device and assign them to the correct zone.
  • Log in to the BIG-IQ CM and assign the BIG-IP to the zone to which those DCDs are assigned.

Change the zone for a Data Collection Device

Normally, you assign a Data Collection Device (DCD) to a zone as part of the initial setup for that device. But you can change the zone to which a DCD is assigned as needed.
  1. From BIG-IQ, at the top of the screen, click
    System
    , then, on the left, click
    BIG-IQ DATA COLLECTION
    BIG-IQ Data Collection Devices
    .
    The BIG-IQ Data Collection Devices screen opens listing the DCDs in the cluster. The Services column lists the BIG-IP services monitored by each DCD. If no services are enabled for a DCD, this column displays
    Add Services
    instead.
  2. Under Device Name, select the DCD that you want to revise.
  3. On the DCD properties page, click
    Edit
    to display the Edit Zone popup.
    • To use an existing
      Zone
      , select the zone you want to assign to this DCD and click
      Continue
      .
    • To use a new
      Zone
      , select
      Create New
      , then type the name of the zone to want to create and assign to this DCD and click
      Continue
      .
  4. Click
    Save & Close
    to close the DCD properties screen.
  5. Use SSH to log in to DCD as
    root
    .
  6. Type
    bigstart restart elasticsearch
    and press Enter.
  7. Repeat the last three steps for each DCD that you want to move to this zone.
    As you run this command on each DCD, it momentarily stops processing DCD data, so the data routes to another node in the cluster and no data is lost.
You can now assign managed BIG-IP devices to this zone for data collection.

Change a zone for a BIG-IP device

Before you can change a BIG-IP device's zone, you must have created the zone on the Data Collection Device (DCD).
Changing the zone assignment for a BIG-IP determines which DCDs collect statistics data for that device. Normally, you assign a BIG-IP to a zone as part of the initial setup for that device, but you can change the zone to which a BIG-IP device is assigned as needed.
  1. At the top of the screen, click
    Devices
    .
  2. Click on the name of the device for which you want to change the zone.
    The properties screen displays for that device.
  3. On the left, click
    STATISTICS COLLECTION
    .
  4. For
    Collect Statistics Data
    , select
    Enabled
    , to collect statistics from this device.
  5. For
    Zone
    , select the zone to which you want to assign this BIG-IP device.
  6. Click
    Save & Close
    to close the device properties screen.
DCDs assigned to the zone you selected start collecting the statistics data for this device.

General Elasticsearch FAQ

Scaling incoming data on BIG-IQ

BIG-IQ applies Elasticsearch (ES) to automatically distribute data across all available data collection devices (DCDs) in your system setup.
On BIG-IQ, data coming in from managed BIG-IP devices are distributed into an index. These indices are actually logical grouping of physical shards, where each shard is a self-contained index (each index usually includes 5 shards). ES automatically balances new data across the shards in an index using an internal hashing algorithm. Each shard grows as the amount of data increases. Since the statistic and event data provided by managed BIG-IPs is fairly consistent, the shards will each grow at approximately the same rate. With the use of zones (see
Overview of DCD Zones
), this may not be the case.
To protect against potential data loss, there are two different kinds of shards: primary and replica. BIG-IQ allows each primary shard to have up to two replicas (total of three shards). By default, there is one replica shard for each of the five primary shards. ES manages these shards so that each shard (primary or replica) is on a separate DCD. All data is written originally to the primary shard and then copied to each of the replicas. Replica shards allow for high availability and improved read performance of the data. Once an index is created, the number of primary shards cannot be changed, but the number of replicas can be adjusted. When a new DCD is added to the cluster, ES will redistribute both the primary and replica shards to take advantage of the new node.
The following is an example of how a single index might get balanced across three DCDs when there are four primary shards and two replicas per primary (4 primary + (2 replicas * 4 primary) = 12 total shards).
DCD 1
DCD 2
DCD 3
Primary-0
Replica-0
Replica-0
Primary-1
Replica-1
Replica-1
Primary-2
Replica-2
Replica-2
Primary-3
Replica-3
Replica-3

Data scaling in DCD zones

Elasticsearch (ES) and BIG-IQ provide an additional option to group data using zones. Zoning allows you to put BIG-IP devices and data collection devices (DCDs) in close proximity to each other while the BIG-IQ management console is in another location. In an environment that has more than one data center, it is helpful to ensure that the DCD closest to the data source are used for storing that data. The biggest reason for doing this is to avoid any issues relating to latency between the BIG-IP and DCDs.
Upon initial setup, BIG-IQ uses one zone for all data (named 'default') and additional zones may be created as needed. When specifying a specific zone for a host device, ES will ensure the primary data for that host is on a DCD in that zone. This can lead to several shards being larger in an index if the amount of data coming from the assigned hosts is greater in one zone than another. ES prefers to allocate replica shards in a different zone than the primary, but will allocate replicas in the same zone if space is not available in a different zone. This ensures that the loss of a single data center doesn't cause total data loss for that zone. Adding more replicas increases resiliency, because each replica will be placed in a different zone allowing the ability to protect the data even with the loss of multiple data centers.
When adding a DCD to the cluster, each node is assigned a specific zone to store primary data. BIG-IQ then allows you to associate a host to that zone to ensure data from that host goes to that zone. This allows you to ensure the primary data within a data center will not traverse the WAN.
For more information about configuring zones, see
Configure Data Collection Devices
in
Setting up and Configuring a BIG-IQ Centralized Management Solution
on
support.f5.com
. For a sample BIG-IQ zone configuration, see BIG-IQ Zone Management.