Morning! welcome to virtualcloudblog.com and thanks for checking it out. Today, I’ll write this post about vSAN and the the Health Check services.
What is the vSAN Health service?
It’s an integrated vSAN service to verify the configuration and operation of your vSAN clusteThe vSAN health service is turned on by default. It can turn periodical health checks off or on, and set the health-check interval.
How to enable /disable it?
Procedure
- Navigate to the vSAN cluster in the vSphere Web Client.
- Click the Configure tab.
- Under vSAN, select Health and Performance.
- Click the Health Services Edit settings button.
- To turn off periodical health checks, deselect Turn ON periodical health check.
You also can set the time interval between health checks.
To view the status of vSAN health checks and to verify the configuration and operation of your vSAN cluster, navigate to vSAN Cluster,
- Click the Monitor tab and click vSAN.
- Select Health to review the vSAN health check categories.
- If the Test Result column displays Warning or Failed, expand the category to review the results of individual health checks.
- Select an individual health check and check the detailed information at the bottom of the page.
- You can click the Ask VMware button to open a knowledge base article that describes the health check and provides information about how to resolve the issue.
In my case, there is a Warning related to the “vSAN Disk Balance”. So let’s solve it!
Click on the Warning, Disk Balance, Data to Move (this is the amount of vSAN data to rebalance across the different hosts within the vSAN cluster). To start the rebalance just click the button and confirm the pop-up.
Please be aware of it will take same time (hours I would bet in my case). It varies depending on the vSAN free space, amount of data, disk speed and network bandwith.
Now vSAN starts the rebalancing process.
How to check rebalance from RVC?
One thing we need to understand is, rebalance is not a vSAN resync.
Connect to the RVC. Visit this post just in case you don’t know how to connect to RVC
Using the vsan.proactive_rebalance_info <clusterID> from the RVC we see:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
/localhost/arango-lab/computers> vsan.proactive_rebalance_info 1 2018-07-12 13:00:16 +0000: Retrieving proactive rebalance information from host host1.arango.es ... 2018-07-12 13:00:16 +0000: Retrieving proactive rebalance information from host host2.arango.es ... 2018-07-12 13:00:16 +0000: Retrieving proactive rebalance information from host host3.arango.es ... 2018-07-12 13:00:16 +0000: Retrieving proactive rebalance information from host host4.arango.es ... 2018-07-12 13:00:16 +0000: Retrieving proactive rebalance information from host host5.arango.es ... 2018-07-12 13:00:22 +0000: Fetching vSAN disk info from host1.arango.es (may take a moment) ... 2018-07-12 13:00:22 +0000: Fetching vSAN disk info from host2.arango.es (may take a moment) ... 2018-07-12 13:00:22 +0000: Fetching vSAN disk info from host3.arango.es (may take a moment) ... 2018-07-12 13:00:22 +0000: Fetching vSAN disk info from host4.arango.es (may take a moment) ... 2018-07-12 13:00:22 +0000: Fetching vSAN disk info from host5.arango.es (may take a moment) ... 2018-07-12 13:00:26 +0000: Done fetching vSAN disk infos Proactive rebalance start: 2018-07-12 11:06:51 UTC Proactive rebalance stop: 2018-07-13 11:06:51 UTC Max usage difference triggering rebalancing: 30.00% Average disk usage: 65.00% Maximum disk usage: 80.00% (43.00% above minimum disk usage) Imbalance index: 28.00% |
Our cluster needs to be balance (we knew that fro the GUI) but we also see the disk usage, and the imbalance index
1 2 3 4 5 6 7 |
Max usage difference triggering rebalancing: 30.00% Average disk usage: 65.00% Maximum disk usage: 80.00% (43.00% above minimum disk usage) Imbalance index: 28.00 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
Disks to be rebalanced: +----------------------+-----------------------------+----------------------------+--------------+ | DisplayName | Host | Disk usage above threshold | Data to move | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host1.arango.es | 12.00% | 201.2048 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 13.00% | 217.9719 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 11.00% | 184.4378 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host1.arango.es | 10.00% | 167.6707 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 11.00% | 184.4378 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 2.00% | 33.5341 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host1.arango.es | 11.00% | 184.4378 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 12.00% | 201.2048 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 13.00% | 217.9719 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 8.00% | 134.1366 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host1.arango.es | 11.00% | 184.4378 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 4.00% | 67.0683 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 7.00% | 117.3695 GB | | naa.5000c5009xxxxxxx | host1.arango.es | 9.00% | 150.9036 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host2.arango.es | 4.00% | 67.0683 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 7.00% | 117.3695 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 5.00% | 83.8354 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 12.00% | 201.2048 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host2.arango.es | 13.00% | 217.9719 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 13.00% | 217.9719 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 3.00% | 50.3012 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 6.00% | 100.6024 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host2.arango.es | 2.00% | 33.5341 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 8.00% | 134.1366 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 8.00% | 134.1366 GB | | naa.5000c5009xxxxxxx | host2.arango.es | 12.00% | 201.2048 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.5000c5009xxxxxxx | host2.arango.es | 3.00% | 50.3012 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host3.arango.es | 5.00% | 83.8354 GB | | naa.500003977xxxxxxx | host3.arango.es | 5.00% | 83.8354 GB | | naa.500003977xxxxxxx | host3.arango.es | 3.00% | 50.3012 GB | | naa.500003977xxxxxxx | host3.arango.es | 4.00% | 67.0683 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host3.arango.es | 1.00% | 16.7671 GB | | naa.500003977xxxxxxx | host3.arango.es | 10.00% | 167.6707 GB | | naa.500003977xxxxxxx | host3.arango.es | 11.00% | 184.4378 GB | | naa.500003977xxxxxxx | host3.arango.es | 6.00% | 100.6024 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host3.arango.es | 3.00% | 50.3012 GB | | naa.500003977xxxxxxx | host3.arango.es | 8.00% | 134.1366 GB | | naa.500003977xxxxxxx | host3.arango.es | 8.00% | 134.1366 GB | | naa.500003977xxxxxxx | host3.arango.es | 6.00% | 100.6024 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host3.arango.es | 4.00% | 67.0683 GB | | naa.500003977xxxxxxx | host3.arango.es | 9.00% | 150.9036 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host4.arango.es | 7.00% | 117.3695 GB | | naa.500003977xxxxxxx | host4.arango.es | 7.00% | 117.3695 GB | | naa.500003977xxxxxxx | host4.arango.es | 1.00% | 16.7671 GB | | naa.500003977xxxxxxx | host4.arango.es | 3.00% | 50.3012 GB | | naa.500003977xxxxxxx | host4.arango.es | 6.00% | 100.6024 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host4.arango.es | 8.00% | 134.1366 GB | | naa.500003977xxxxxxx | host4.arango.es | 6.00% | 100.6024 GB | | naa.500003977xxxxxxx | host4.arango.es | 12.00% | 201.2048 GB | | naa.500003977xxxxxxx | host4.arango.es | 6.00% | 100.6024 GB | | naa.500003977xxxxxxx | host4.arango.es | 13.00% | 217.9719 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host4.arango.es | 12.00% | 201.2048 GB | | naa.500003977xxxxxxx | host4.arango.es | 6.00% | 100.6024 GB | | naa.500003977xxxxxxx | host4.arango.es | 9.00% | 150.9036 GB | | naa.500003977xxxxxxx | host4.arango.es | 6.00% | 100.6024 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003977xxxxxxx | host4.arango.es | 11.00% | 184.4378 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003981xxxxxxx | host5.arango.es | 10.00% | 167.6707 GB | | naa.500003981xxxxxxx | host5.arango.es | 11.00% | 184.4378 GB | | naa.500003981xxxxxxx | host5.arango.es | 4.00% | 67.0683 GB | +----------------------+-----------------------------+----------------------------+--------------+ | naa.500003981xxxxxxx | host5.arango.es | 3.00% | 50.3012 GB | +----------------------+-----------------------------+----------------------------+--------------+ |
Here some further information about the command: proactive_rebalance [opts] vsan-cluster
-s, –start: Start proactive rebalance
-t, –time-span=: Determine how long this proactive rebalance lasts in seconds, only be valid when option ‘start’ is specified
-v, –variance-threshold=: Configure the threshold, that only if disk’s used_capacity/disk_capacity exceeds this threshold(comparing to the disk with the least fullness in the cluster), disk is qualified for proactive rebalance, only be valid when option ‘start’ is specified
-i, –time-threshold=: Threshold in seconds, that only when variance threshold continuously exceeds this threshold, the corresponding disk will be involved to proactive rebalance, only be valid when option ‘start’ is specified
-r, –rate-threshold=: Determine how many data in MB could be moved per hour for each node, only be valid when option ‘start’ is specified
-o, –stop: Stop proactive rebalance
I hope it helps you! Thanks for sharing!!!