Applies To:
Show VersionsBIG-IQ Centralized Management
- 5.3.0
What do I do if there is not enough disk space?
If there is not enough disk space to install the new software, you need to extend the /var partition. The default size of the /var file system in a newly installed node is 10 GB. This volume size might be insufficient to store your data. You can see how to extend this file system to a larger size in knowledge article K16103. Refer to: K16103: Extending disk space on BIG-IQ Virtual Edition at support.f5.com/csp/article/K16103. Because upgrading a node requires at least two volumes, you must ensure that both volumes can have their /var file system extended to the same size, or upgrades might fail.
Symptom
If the message: UCS restore failure displays during software installation, it may well be due to insufficient disk space.
- Log in to the BIG-IQ system or DCD on which the software is failing to install.
- Run the command: tmsh show sys
software.
The system displays failed (UCS application failed; unknown cause in response.
- Navigate to the
liveinstall.log file in the /var/log/
folder. If the issue triggering the error message is insufficient disk space, then the
file contains the following error
message:
info: capture: status 256 returned by command: F5_INSTALL_MODE=install F5_INSTALL_SESSION_TYPE=hotfix chroot /mnt/tm_install/9934.NdHXAL /usr/local/bin/im -force /var/local/ucs/config.ucs info: >++++ result: info: Extracting manifest: /var/local/ucs/config.ucs info: /var: Not enough free space info: 3404179456 bytes required info: 2206740480 bytes available info: /var/local/ucs/config.ucs: Not enough free disk space to install! info: Operation aborted. info: >---- info: Removing boot loader reference Terminal error: UCS application failed; unknown cause. *** Live install end at 2017/03/21 17:00:54: failed (return code 2) ***
Recommended Actions
- Switch back to the older (pre-upgrade) volume.
- Log in to the BIG-IQ system or DCD on which the software is failing to install.
- Change to the pre-upgrade volume by running the command switchboot -b <old-volume-name>.
- Restart the device by running the command reboot.
- Clean up the audit log entries and snapshot objects.
- Retry the installation.
Data collection device cluster status is yellow
After upgrading a data collection device (DCD), if the cluster status is yellow (unhealthy) instead of green (healthy), there are a number of potential causes and corresponding corrective actions you can attempt to resolve the issue.
Connectivity Issues
There could be unassigned replica shards in the cluster. Network connectivity issues can cause relocation of shards to a newly upgraded DCD to fail.
Symptoms
- Log in to the primary BIG-IQ DCD for the cluster.
- Navigate to the /var/log/elasticsearch/ directory.
- Examine the
eslognode.log file, to see if there is an add/remove/add cycle due
to temporary network connectivity issues.
If there are unassigned replica shards, you will find a pattern similar to the following:
[2017-05-03 10:49:20,885][INFO ][cluster.service ] [f992a8aa-49c8-47ba-a59a-2be863f3a042] added {{7c49c2ef-ad2f-41f5-ab15-e01084b20364} {_xdtaTl5RGSn-pIC192ORA}{10.10.10.2}{10.10.10.2:9300}{zone=Seattle, master=true},}, reason: zen-disco-join(join from node[{7c49c2ef-ad2f-41f5-ab15-e01084b20364} {_xdtaTl5RGSn-pIC192ORA}{10.10.10.2}{10.10.10.2:9300}{zone=Seattle, master=true}]) [2017-05-03 10:49:32,172][INFO ][cluster.service ] [f992a8aa-49c8-47ba-a59a-2be863f3a042] removed {{7c49c2ef-ad2f-41f5-ab15-e01084b20364}{_xdtaTl5RGSn-pIC192ORA}{10.10.10.2} {10.10.10.2:9300}{zone=Seattle, master=true},}, reason: zen-disco-node_failed( {7c49c2ef-ad2f-41f5-ab15-e01084b20364}{_xdtaTl5RGSn-pIC192ORA}{10.10.10.2}{10.10.10.2:9300} {zone=Seattle, master=true}), reason transport disconnected [2017-05-03 10:49:36,210][INFO ][cluster.service ] [f992a8aa-49c8-47ba-a59a-2be863f3a042] added {{7c49c2ef-ad2f-41f5-ab15-e01084b20364}{_xdtaTl5RGSn-pIC192ORA}{10.10.10.2}{10.10.10.2:9300} {zone=Seattle, master=true},}, reason: zen-disco-join(join from node[{7c49c2ef-ad2f-41f5-ab15-e01084b20364} {_xdtaTl5RGSn-pIC192ORA}{10.10.10.2}{10.10.10.2:9300}{zone=Seattle, master=true}])
Recommended Actions
- Log in to the DCD that you just upgraded.
- Restart the elastic search service by running the command: bigstart restart elasticsearch.
Shard assignment can now succeed, so the cluster status should change to healthy (green).
Index created in higher version
When a DCD is upgraded, ElasticSearch begins creating indexes on that DCD. Because the upgraded DCD is running a different software version than the other DCDs in the cluster, shards cannot be replicated between the DCDs. This can result in unassigned replica shards in the cluster.
Symptoms
- Log in to the upgraded DCD.
- Check for unassigned replica shards by
running the following command:
curl -s localhost:9200/_cat/shards?v | grep UNASSIGNED
- If there are no unassigned shards, there should be no response to the command. If there are unassigned shards, try the recommended actions.
Recommended Actions
- For a single zone DCD cluster, continue the upgrade process by upgrading another DCD in the cluster. When another DCD in the cluster is running the same version, shards can begin replicating again and the cluster status should become healthy.
- For a multiple zone DCD cluster, continue the upgrade process but upgrade the next DCD in a different zone than the first DCD. When there are DCDs running the same version in different zones, shards can begin replicating again and the cluster status should become healthy.
Data collection device cluster status is red
After upgrading a data collection device (DCD), if the cluster status is red (unhealthy) instead of green (healthy), there are a number of potential causes and corresponding corrective actions you can attempt to resolve the issue.
Statistics replicas are not enabled
If statistics replicas were not enabled before you upgraded, the cluster will not create replicas of your data and the cluster health will be unhealthy.
Recommended Actions
- Log in to the primary BIG-IQ for the DCD cluster.
- Navigate to the Statistics Retention Policy screen and expand the Advanced Settings, then select Enable Replicas.
- Login to the most recently upgraded DCD.
- Change to the pre-upgrade volume by running the command switchboot -b <old-volume-name>.
- Reboot the DCD by running the command reboot.
- Wait for the rebooted DCD to join the cluster.
- Wait for the cluster status to return to green (indicating that the cluster has successfully replicated your data shards.)
- Repeat the upgrade for the DCDs in the cluster.
After the upgrade, the cluster status should change to healthy (green).
Generic Failure
Sometimes for no discernable reason, the DCD cluster fails to assign the primary data shards.
Symptoms
There are no symptoms to confirm this other than the cluster status unhealthy (red). However, if you have tried other corrective actions and the problem persists, you can try this remedy to see if it solves the problem.
Recommended Actions
- Log in to the primary BIG-IQ for the DCD cluster.
- Change to the pre-upgrade volume by running the command switchboot -b <old-volume-name>.
- Wait for the cluster status to return to green (indicating that the cluster has successfully replicated your data shards.)
- Check to see that replicas exist for each
primary shard.
- Log in to the BIG-IQ system or DCD on which the software is failing to install.
- Check for replicas of each primary by
running the command curl -s localhost:9200/_cat/shards?v. A
typical response would look like this:Note: In the following sample response, note that each primary index (designated with a p in the docs column) has a corresponding replica (designated with an r in the docs column), and that the replica exists on a node with an IP address that is different than the node that the primary index is on.
index shard prirep state docs store ip node websafe_2017-07-21t00-00-00-0700 2 r STARTED 0 159b 10.145.193.11 f7b33853-da66-4587-bb70-5d2dbc254a05 websafe_2017-07-21t00-00-00-0700 2 p STARTED 0 159b 10.145.192.202 687a7b3b-7dc3-4074-9b22-1e26c6092877 websafe_2017-07-21t00-00-00-0700 1 r STARTED 0 159b 10.145.193.11 f7b33853-da66-4587-bb70-5d2dbc254a05 websafe_2017-07-21t00-00-00-0700 1 p STARTED 0 159b 10.145.192.202 687a7b3b-7dc3-4074-9b22-1e26c6092877 websafe_2017-07-21t00-00-00-0700 3 r STARTED 0 159b 10.145.193.11 f7b33853-da66-4587-bb70-5d2dbc254a05 websafe_2017-07-21t00-00-00-0700 3 p STARTED 0 159b 10.145.192.202 687a7b3b-7dc3-4074-9b22-1e26c6092877 websafe_2017-07-21t00-00-00-0700 4 r STARTED 0 159b 10.145.193.11 f7b33853-da66-4587-bb70-5d2dbc254a05 websafe_2017-07-21t00-00-00-0700 4 p STARTED 0 159b 10.145.192.202 687a7b3b-7dc3-4074-9b22-1e26c6092877 websafe_2017-07-21t00-00-00-0700 0 r STARTED 0 159b 10.145.193.11 f7b33853-da66-4587-bb70-5d2dbc254a05 websafe_2017-07-21t00-00-00-0700 0 p STARTED 0 159b 10.145.192.202 687a7b3b-7dc3-4074-9b22-1e26c6092877
- If there are replicas for each primary shard, repeat the upgrade for the DCDs in the cluster.
After you upgrade the DCDs in the cluster, the cluster status should change to healthy (green).
Data collection device cluster is offline
After upgrading a data collection device (DCD), if the cluster status is completely offline, there is one primary potential cause and corresponding corrective action you can attempt to resolve the issue.
Election of new master node failed
When the DCD cluster's master node is rebooted, a new master must be elected. Sometimes that election can fail, which causes the cluster to be offline.
Symptoms
There are a number of different symptoms that can indicate that the master election has failed. Master election failure can only occur when three conditions are met:- You are upgrading from version 5.2.0 to version 5.3.0.
- The DCD that was upgraded and rebooted was the master node before the upgrade.
- Statistics replicas were enabled shortly before the upgrade and reboot.
- Use SSH to log in to the primary BIG-IQ for the DCD cluster.
- Check the cluster status by submitting the following API call: curl http://localhost:9200/_cat/nodes?v.
- If the response to the API call is
similar to the following, the master election
failed.
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503} $
Another symptom of an election failure is when the ElasticSearch log file contains error messages.
- Use SSH to log in to the primary BIG-IQ for the DCD cluster.
- Navigate to the /var/log/elasticsearch/ folder and open the log file eslognode.log.
- Examine the log file. The following text
snippets are examples of the two error messages that signal an election failure. If either
of these messages are in the log file, the master election
failed.
[2017-06-01 16:13:42,792][ERROR][discovery.zen ] [eb1d873a-ffdb-4ae8-ae22-946554970c54] unexpected failure during [zen-disco-join(elected_as_master, [4] joins received)] RoutingValidationException[[Index [statistics_tl0_device_2017-152-21]: Shard [2] routing table has wrong number of replicas, expected [0], got [1], Index [statistics_tl0_device_2017-152-21]: Shard [1] routing table has wrong number of replicas, expected [0], got [1], Index [statistics_tl0_device_2017-152-21]: Shard [4] routing table has wrong number of replicas, expected [0], got [1], Index [statistics_tl0_device_2017-152-21]: Shard [3] routing table has wrong number of replicas,expected [0], got [1], Index [statistics_tl0_device_2017-152-21]: Shard [0] routing table has wrong number of replicas, expected [0], got [1]]]
[2017-06-01 16:14:48,992][INFO ][discovery.zen ] [163e016c-3827-496c-b306-b2972d60c8df] failed to send join request to master [{eb1d873a-ffdb-4ae8-ae22-946554970c54} {IyzDAx59Swy6pNqg2YU0-Q}{10.145.192.147}{10.145.192.147:9300}{data=false, zone=L, master=true}], reason [RemoteTransportException[[eb1d873a-ffdb-4ae8-ae22-946554970c54] [10.145.192.147:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{eb1d873a-ffdb-4ae8-ae22-946554970c54}{IyzDAx59Swy6pNqg2YU0-Q}{10.145.192.147} {10.145.192.147:9300}{data=false, zone=L, master=true}] not master for join request]; ] [2017-06-01 16:14:50,124][WARN ][rest.suppressed ] path: /_bulk, params: {}ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]
Recommended Actions
Repeat these steps for each DCD in the cluster.- Use SSH to log in to the first DCD in the cluster.
- Restart the ElasticSearch service by running the command: bigstart restart elasticsearch.
Restarting the service for each DCD in the cluster triggers a new master node election. After the last DCD in the cluster is restarted, the cluster status should change to healthy (green).