Manual Chapter :
Configure Data Collection Devices
Applies To:
Show VersionsBIG-IQ Centralized Management
- 8.2.0, 8.1.0
Configure Data Collection Devices
Data Collection Device configuration overview
You configure the Data Collection Devices (DCD)s for your BIG-IQ
solution using the BIG-IQ CM. The BIG-IQ controls the settings for everything except the
zone assignment and master key settings.
Statistics retention policy overview
Statistics retention impacts quantity of data reported by your managed BIG-IP devices. When you choose how much data to retain, you need to consider how
much disk space you have available. The controls for data retention require an understanding of how much space you have to store data, and which data needs additional storage.
The fields on the Statistics Retention Policy screen set the size of the
indices that BIG-IQ uses to store raw data. These fields all work in a similar fashion. One
way to understand how these indices work is to think of your data storage space as a set of
containers.
The values you specify on this screen determine how much storage space each container
(index) consumes. Because data is saved for the time periods you specify, the longer the
time period that you specify, the more space you consume. The disk storage that is consumed
depends on several factors.
- The number of BIG-IP devices for which you are collecting data
- The number of objects those BIG-IP devices have (for example, virtual servers, pools, pool members, and iRules)
- The frequency of data collection
- The data retention policy
- The data replication policy
- Additional data storage for prioritized service groups
The following are key concepts to understand about how the retention policy works.
How long is data in each container retained? | Data is retained in each
container
for the time period you
specify. When the specified level is reached, the oldest chunk of data is
deleted. For example, if you specify a raw data value of 48 hours of retained
collected data, when 48 hours of raw data has accumulated, the next hour of
incoming raw data triggers the BIG-IQ to delete the oldest hour of collected
data. |
When does data from one container transfer to the
next? | The BIG-IQ transfers data from one container to the
next in increments that are the size of the next (larger) container. That is,
every 60 minutes, BIG-IQ aggregates the last 60 minutes of collected raw data
into a data set and passed to the Hour(s) container. Every 24 hours, BIG-IQ aggregates the last
24 hours of hourly data into a data set and passed to the Day(s) container, and so on
for the Month(s)
container. |
What about limits? | Limit
Max Storage to specifies the percentage of total disk space that
you want data to consume on the DCDs in your cluster. If
more disk space is consumed than the percentage you specified, BIG-IQ takes
two actions to prevent data corruption when storage is completely
exhausted
|
Global vs. Group properties | Global properties is applied for all statistics data retention. You can adjust retention settings for specific service groups that are activated on your system DCDs:
|
Manage the retention policy for your statistics data
Before you can set the statistics retention policy, you must have added a data collection device (DCD). If you are adding statistics retention for a specific service group, ensure that the service is activated on the DCD.
You can manage the default settings that determine how your statistics data is retained, based on quality. The highest quality data is real-time (raw) data, (data that has not been averaged), but that consumes a lot of disk space, so you need to consider your needs in choosing your data retention settings.
- From BIG-IQ, at the top of the screen, clickSystem, then, on the left, clickBIG-IQ DATA COLLECTIONand then selectBIG-IQ Data Collection Cluster.The BIG-IQ Data Collection Cluster screen opens. On this screen, you can view summary status for the Data Collection Device (DCD) cluster and access the screens that you can use to configure the DCD cluster.
- UnderSUMMARY, you can access screens detailing how much data is stored, as well as how the data is stored.
- UnderCONFIGURATION, you can access the screens that control DCD cluster performance.
- Under the screen name, click.The Statistics Collection Status screen opens.
- Click theConfigure Retentionbutton.The Statistics Retention Policy screen opens.
- Go toGlobal Propertiesto edit default retention:
- In theKeep real-time (raw) data up tofield, type the number of hours of raw data to retain.You must specify a minimum of 1 hour, so that there is sufficient data to average and create a data point for theKeep hourly data up to
- In theKeep hourly data up tofield, type the number of hourly data points to retain. field, type the number of hours of raw data to retain.You must specify a minimum of 24 hours, so that there is sufficient data to average and create a data point for theKeep daily data up tocontainer.
- In theKeep daily data up tofield, type the number of daily data points to retain.You must specify a minimum of 31 days, so that there is sufficient data to average and create a data point for theKeep monthly data up tocontainer.
- In theKeep monthly data up tofield, type the number of monthly data points to retain.Once the specified number of months passes, the oldest monthly data set is deleted.
- In theLimit max storage to
- In theKeep events up tofield, type the number of days that you want keep events before the oldest events data set is deleted.
- n theKeep traffic capturing up tofield, type the number of days that you want keep captured traffic before the oldest traffic data set is deleted.
The Global Properties are the default retention settings for any service groups, or service group values, that are not populated in Group Properties. - Go toGroup Propertiesto add custom time retention settings for specific service groups.These settings will retain any Global Properties that are not specified in the Group Properties. Ensure that you have enough disk space to accommodate service group statistics retention.
- Expand Advanced Settings:The following configuration of data scaling and resilience using Elasticsearch. For more information, seeGeneral Elasticsearch FAQ.
- Select theReplicascheck box to enable high availability for the stored data on your DCD cluster.Replicasare copies of a data sets available to the DCD cluster when one or more devices within that cluster become unavailable. By default, data replication for statistics is enabled. Disabling replication reduces the amount of disk space required for data retention. However, this provides no protection from data corruption that can occur when you remove a DCD. You should enable replicas to provide this protection.
- Select theAuto expand replicascheck box to enable automatic duplication of the number of replicas for a specific data set.This allows the DCD cluster to dynamically host up to 2 separate replicas for a given data set, based on the number of DCDs available. This provides redundancy that protects from data loss even when more than one DCD becomes unavailable.This option is only available whenEnable Replicasis selected. In addition, your system must include at least 3 DCDs (one primary and two replicas) with sufficient disk space.
- When you are satisfied with the values specified for data retention, clickSave & Close.
Log index rotation policy overview
The optimum settings used to configure your Data Collection Device (DCD) indices depend on a number of key factors.
- The system provides the ability to dynamically create new indices based on either a specified interval or a specified size. The primary goal to consider when you make these decisions is how to maintain a maximum disk allocation for the DCD data, while maintaining capacity for new data that flows in.
- Secondary considerations include search optimization, and the ability to optimize old indices to reduce their size.
- Generally, the best policy is one that does not create unnecessary indices. The more indices, the lower the overall performance, because your searches have to deal with more shards. For example, if you know a service has a low indexing volume (thousands/day) then it makes the most sense to have a large aggregation per rotation (5 days or 30 days). For services like Web Application Security that probably have high indexing volumes, it makes more sense to rotate every 8 hours (which reduces the number of retained indices).
- Index rotation also allows new sharding and replica counts by changing the template on a given index type. New indices created from that template will contain the new shard and replica count properties.
This table shows the default configuration values for each index running on BIG-IQ Centralized Management. These values are based on anticipated data ingestion rates and typical usage patterns.
Service | Index Name | Minimum Number of DCDs | Rotation Policy | Retained Index Count | Approximate time window | Size of /var file system |
---|---|---|---|---|---|---|
Access | access-event-logs | 2 | Time/5 days | 19 | 95 days | 500 GB |
Access | access-stats | 2 | Time/5 days | 19 | 95 days | 500 GB |
Web Application Security | asmindex | 2 | Size/100000 MB | 5 | N/A | 500 GB |
FPS | websafe | 2 | Time/30 days | 100 | 8 years | 10 GB |
If multiple services are running on a given DCD, or you have higher inbound data rates, you might have to adjust these values to keep the
/var
file system from filling up. (There is a default alert to warn of this when the file system becomes 80% full.)The simplest resolution is to revise the retained index count; lowering this value reduces the disk space requirements, but it will also reduce the amount of data available for queries. For details about changing this setting, refer to the modifying indices topic for the service you are configuring.
How does the DCD aggregate raw data?
The DCD stores raw data coming from the BIG-IP devices
in data indices. As data is received, it accumulates in the current index. When the
accumulated data reaches the rotation threshold that you set, four things happen.
- A new current index is created.
- BIG-IP data begins accumulating in the new index.
- The former current index becomes one of the retained indices.
- If the total number of indexes is now larger than the retained index count, the oldest one is deleted.
The ideal configuration for data indices
depends on the amount, frequency, and type of data your devices send to the DCD. The default settings are
designed to satisfy most user scenarios, but you might want to explore the settings for
the data types that you plan to send to the DCD, to make sure that those settings meet
your needs.
Modify log indices
Before you can configure the indices for a data
collection device, you must
activate
data collection for the services
that you want to collect data for.
BIG-IQ stores incoming BIG-IP device data in indices on the Data Collection Devices (DCD)
cluster. Each service that sends data uses it's own indices. You control how the BIG-IQ
manages your data by adjusting the settings for the
Indices
for each service.- From BIG-IQ, at the top of the screen, clickSystem, then, on the left, clickBIG-IQ DATA COLLECTIONand then selectBIG-IQ Data Collection Cluster.The BIG-IQ Data Collection Cluster screen opens. On this screen, you can view summary status for the Data Collection Device (DCD) cluster and access the screens that you can use to configure the DCD cluster.
- UnderSUMMARY, you can access screens detailing how much data is stored, as well as how the data is stored.
- UnderCONFIGURATION, you can access the screens that control DCD cluster performance.
- Under the screen name, click.The Settings screen opens.
- Click theConfigurebutton for the service that you want to set up.BIG-IQ displays the indices settings for the selected service.
- Perform the next two steps for each index.If you are configuring the Access service, use the same indices values for theaccess-event-logsandaccess statsto avoid a mismatch in the reports generated from your logging data.
- Specify theRotation Type.
- To chunk your data based on the amount of data:
- SelectSize Based
- For theMax Index Size, type the size of the indexes you want to create.
For example, if you type1000, when the index size reaches 1 Gb, it becomes a retained index and new data from your BIG-IP begins accumulating in a new current index. If yourRetained Index Countis set to 10, then the maximum disk space used by these indexes will be approximately 10 Gb. - To chunk your data based on the increments of time:
- Select Time Based
- For theRotation Period, specify a time unit, and type how many of those units you want to comprise indexes you want to create.
For example, if you type.5and selectHours, a new index is created every half hour. If yourRetained Index Countis set to 10, then each retained index will contain approximately 5 hours of data.
- For theRetained Index Count, type the total number of indices you want to store on the DCD.This setting determines the maximum amount of data stored on the DCD. When this limit is reached, the oldest data is truncated or discarded. For example, if you set the number of indices to 10 and each index is 1 Gb, then you must have 10 Gb of storage available on your DCD.
- ClickSave & Closeto save the indices configuration settings.
Change the minimum number of master eligible devices
You can manage the minimum number of devices that must be available for the cluster to be considered operational. If the number of available devices is less than the value specified for the Minimum Master Eligible Devices, the cluster is deemed unhealthy.
- From BIG-IQ, at the top of the screen, clickSystem, then, on the left, clickBIG-IQ DATA COLLECTIONand then selectBIG-IQ Data Collection Cluster.The BIG-IQ Data Collection Cluster screen opens. On this screen, you can view summary status for the Data Collection Device (DCD) cluster and access the screens that you can use to configure the DCD cluster.
- UnderSUMMARY, you can access screens detailing how much data is stored, as well as how the data is stored.
- UnderCONFIGURATION, you can access the screens that control DCD cluster performance.
- Under the screen name, clickCONFIGURATIONCluster Settings.The Cluster Settings screen opens.
- To change this setting, clickOverride.The button text changes toUpdate.
- In theMinimum Master Eligible Devicesfield, type or select the new minimum number of healthy devices for this DCD cluster, and clickUpdate.The system updates the setting.
- When you are satisfied with the minimum number of devices setting, clickCancelto close the screen.
How do Data Collection Device zones work?
There are two ways to use Data Collection Device (DCD) zones to control
how data is stored for your managed BIG-IP devices.
- You can use zones to optimize statistics traffic routing. By assigning DCDs to a zone and then assigning managed BIG-IP devices to that zone, you control which DCDs collect statistic traffic for each device.
- DCD zone awareness factors into how the DCD cluster performs during Disaster Recovery scenarios. The role zones play in these scenarios is discussed in the Disaster Recovery Best Practices article onsupport.f5.com.
To specify which DCDs collect statistics traffic for a BIG-IP device,
you perform two tasks:
- Log in to each DCD that should collect data for this BIG-IP device and assign them to the correct zone.
- Log in to the BIG-IQ CM and assign the BIG-IP to the zone to which those DCDs are assigned.
Change the zone for a Data Collection Device
Normally, you assign a Data Collection Device (DCD) to a zone as part of the initial setup for that device. But you can change the zone to which a DCD is assigned as needed.
- From BIG-IQ, at the top of the screen, clickSystem, then, on the left, click .The BIG-IQ Data Collection Devices screen opens listing the DCDs in the cluster. The Services column lists the BIG-IP services monitored by each DCD. If no services are enabled for a DCD, this column displaysAdd Servicesinstead.
- Under Device Name, select the DCD that you want to revise.
- On the DCD properties page, clickEditto display the Edit Zone popup.
- To use an existingZone, select the zone you want to assign to this DCD and clickContinue.
- To use a newZone, selectCreate New, then type the name of the zone to want to create and assign to this DCD and clickContinue.
- ClickSave & Closeto close the DCD properties screen.
- Use SSH to log in to DCD asroot.
- Typebigstart restart elasticsearchand press Enter.
- Repeat the last three steps for each DCD that you want to move to this zone.As you run this command on each DCD, it momentarily stops processing DCD data, so the data routes to another node in the cluster and no data is lost.
You can now assign managed BIG-IP devices to this zone for data collection.
Change a zone for a BIG-IP device
Before you can change a BIG-IP device's zone, you
must have created the zone on the Data Collection Device (DCD).
Changing the zone assignment for a BIG-IP determines which DCDs
collect statistics data for that device. Normally, you assign a BIG-IP to a zone as part
of the initial setup for that device, but you can change the zone to which a BIG-IP
device is assigned as needed.
- At the top of the screen, clickDevices.
- Click on the name of the device for which you want to change the zone.The properties screen displays for that device.
- On the left, clickSTATISTICS COLLECTION.
- ForCollect Statistics Data, selectEnabled, to collect statistics from this device.
- ForZone, select the zone to which you want to assign this BIG-IP device.
- ClickSave & Closeto close the device properties screen.
DCDs assigned to the zone you selected start
collecting the statistics data for this device.
General Elasticsearch FAQ
Scaling incoming data on BIG-IQ
BIG-IQ applies Elasticsearch (ES) to automatically distribute data across all available data collection devices (DCDs) in your system setup.
On BIG-IQ, data coming in from managed BIG-IP devices are distributed into an index. These indices are actually logical grouping of physical shards, where each shard is a self-contained index (each index usually includes 5 shards). ES automatically balances new data across the shards in an index using an internal hashing algorithm. Each shard grows as the amount of data increases. Since the statistic and event data provided by managed BIG-IPs is fairly consistent, the shards will each grow at approximately the same rate. With the use of zones (see
Overview of DCD Zones
), this may not be the case. To protect against potential data loss, there are two different kinds of shards: primary and replica. BIG-IQ allows each primary shard to have up to two replicas (total of three shards). By default, there is one replica shard for each of the five primary shards. ES manages these shards so that each shard (primary or replica) is on a separate DCD. All data is written originally to the primary shard and then copied to each of the replicas. Replica shards allow for high availability and improved read performance of the data. Once an index is created, the number of primary shards cannot be changed, but the number of replicas can be adjusted. When a new DCD is added to the cluster, ES will redistribute both the primary and replica shards to take advantage of the new node.
The following is an example of how a single index might get balanced across three DCDs when there are four primary shards and two replicas per primary (4 primary + (2 replicas * 4 primary) = 12 total shards).
DCD 1 | DCD 2 | DCD 3 |
---|---|---|
Primary-0 | Replica-0 | Replica-0 |
Primary-1 | Replica-1 | Replica-1 |
Primary-2 | Replica-2 | Replica-2 |
Primary-3 | Replica-3 | Replica-3 |
Data scaling in DCD zones
Elasticsearch (ES) and BIG-IQ provide an additional option to group data using zones. Zoning allows you to put BIG-IP devices and data collection devices (DCDs) in close proximity to each other while the BIG-IQ management console is in another location. In an environment that has more than one data center, it is helpful to ensure that the DCD closest to the data source are used for storing that data. The biggest reason for doing this is to avoid any issues relating to latency between the BIG-IP and DCDs.
Upon initial setup, BIG-IQ uses one zone for all data (named 'default') and additional zones may be created as needed. When specifying a specific zone for a host device, ES will ensure the primary data for that host is on a DCD in that zone. This can lead to several shards being larger in an index if the amount of data coming from the assigned hosts is greater in one zone than another. ES prefers to allocate replica shards in a different zone than the primary, but will allocate replicas in the same zone if space is not available in a different zone. This ensures that the loss of a single data center doesn't cause total data loss for that zone. Adding more replicas increases resiliency, because each replica will be placed in a different zone allowing the ability to protect the data even with the loss of multiple data centers.
When adding a DCD to the cluster, each node is assigned a specific zone to store primary data. BIG-IQ then allows you to associate a host to that zone to ensure data from that host goes to that zone. This allows you to ensure the primary data within a data center will not traverse the WAN.
For more information about configuring zones, see
Configure Data Collection Devices
in Setting up and Configuring a BIG-IQ Centralized Management Solution
on support.f5.com
. For a sample BIG-IQ zone configuration, see BIG-IQ Zone Management.