Handling disk failures (DRAFT) |
Normally within a Lattus environment, when a disk on a controller or storage node is showing errors, it will be marked as degraded in the CMC.
From there you can either run a S.M.A.R.T. test to check for errors, reset the disk, or decommission it.
In certain events it is possible that there are impending errors visible on a disk, which are not reported by the CMC.
In such a case the disk would have to be manually decommissioned to be replaced by a new working disk.
Here are the steps to mark the disk as degraded and decommission it using qshell:
api = i.config.cloudApiConnection.find('main') mguid = api.machine.find(name='<name_of_node_with_bad_disk>')['result'][0] dguid = api.disk.list(machineguid=mguid, serial_number='<serial_number_of_bad_disk>')['result'][0]['guid'] api.disk.updateModelProperties(dguid, status='DEGRADED') api.machine.decommission_disks(mguid, [dguid])
You can get the serial number either via
hdparm -I /dev/...
or
api = i.config.cloudApiConnection.find('main') mguid = api.machine.find(name='<name_of_node_with_bad_disk>')['result'][0] api.disk.list(machineguid=mguid, name='<name+of+bad_disk>')
Now you can replace the disk.
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |