Replacing a Storage Node Chassis FRU

 

Replacing the Hardware - Storage Node Chassis

This section describes how to replace a Lattus Storage Node chassis. The intent is to retain current hard disk drives and avoid having to rebuild an entire node. No decommissioning or repairs are required when replacing a chassis.

Replacing the Hardware

Prerequisites:

• An empty Storage node chassis

• Small tip Philps screwdriver

• Crash cart (VGA monitor and a USB keyboard) to be provided by the site.

Follow these steps to replace the storage node chassis:

1 – Identify the storage node to be work on, using the location LED provided by the CMC GUI if needed.

2 - Perform a Shut down of the storage node through the CMC GUI.

3 - Detach the safety on the power cables and unplug the power cables. If necessary, mark them.

4 - Unplug the network cables and mark them, if necessary.

5 - Safely unmount the node from the rack as previously described, and place it on a table.

Caution: Use two people to safely mount the node in the rack, or to unmount it.

IMPORTANT! - After you pull the node past the pull-safety, don't leave the node in the rack. Otherwise, the rack rails might break.

6 - Place the empty chassis next to the old one.

7 - Unscrew the 4 screws from the top plate and carefully slide it off.

8 - Unscrew the disks one at a time from the node and slide them out, using the handle.

9 - Carefully place the disks in the empty chassis, each in the same place as was in the old chassis.

                IMPORTANT! - Be sure to put the securing screw the disks back in place.

10 - After all disks have been installed into the chassis, slide the top plate on the node and secure it with the 4 previously removed screws.

11 - Use two people to carefully mount the new storage node into the rack, in the same place as the old one.

                For more information about mounting the storage node, refer to the instructions in the Lattus Installation Guide.

12 - Reconnect the power cables and their safety, BUT DO NOT plug in the ethernet cables at this time!

13 - Hook up a monitor and keyboard to the node and get into its BIOS by repeatedly pressing the "DEL" or "Delete" key repeatedly,

        IMMEDIATLY after powering it on for about 40 seconds, each depression about a half second apart.

                Note! - You may not see the American Megatrends logo after power on,

                but you must start repeatedly pressing the DEL key anyway for about 40 seconds, each about a half second apart.

                Note! - It will take quite a few minutes for the BIOS home screen to appear.

14 - From the main BIOS screen use the right arrow to get to the "Boot" menu.  From there:

                a -   Use the down arrow key to scroll down in its menu selections until the "Hard Drive BBS Priorities" item is selected and hit enter.   

                b -   From the list of disks in the storage node that will be presented, highlight "Boot Option #1" and press enter.

                c -   From the list of detected disks.  Select the "P" disk with the lowest possible index number and press Enter.

                        For example, if the disk is showing anything other than a P0 or a P1 you will be selecting the lowest of those two that is listed.

                        The list will refresh and show the disk you selected.

                                Explanation: The two disks that begin with the letter P and have the lowest numerical index are the two mirrored Operating System disks.

                                In most cases, this will be "P0" and "P1".  In rare cases, the disks will be "P1" and "P2".

                d -  From the list of disks in the storage node that are presented, highlight "Boot Option #2" and press enter.

                e - Select the next P# in line for this disk.  In other words if the Boot #1 you selected previously was "P0" you would select "P1" for this boot disk.

                        If the Boot #1 you selected previously was "P1" you would select "P2" for this boot disk.

15 - Hit the F10 key and select "Yes" to save.

16 - Reconnect the network cables.  Disconnect the monitor and keyboard

17 - Push the power button to boot the node.  After the node has been successfully booted, it will appear in the uninitialized devices list in the CMC.

        (To view uninitialized devices in the CMC, navigate to Dashboard > Administration > Hardware > Servers > Unmanaged Devices > Uninitialized.)

 

 

 

 

Replacing the Node in the Software

Note: All MAC addresses must be entered in uppercase, unless otherwise specified.

Step(1) In the CMC, navigate to:

                Dashboard > Administration > Hardware > Servers > Unmanaged Devices > Uninitialized.

Step(2) Note the IP address and MAC address of the new node.

Step(3) Open a new SSH session to the management controller node (and exit OSMI).

Step(4) Create a new SSH session to the IP address you noted earlier.

                root@Controller1:~# ssh 10.10.1.247

                root@10.10.1.247's password:

                Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.8.0-33-generic x86_64)

 

Step(5) Find the MAC addresses of the new node by running the command:              "ip a"

                root@nfsROOT:~# ip a

                1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

                                link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

                                inet 127.0.0.1/8 scope host lo

                                inet6 ::1/128 scope host

                                valid_lft forever preferred_lft forever

                2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                link/ether 08:60:6e:44:98:1a brd ff:ff:ff:ff:ff:ff

                                inet 10.10.1.247/24 brd 10.10.1.255 scope global eth0

                                inet6 fe80::a60:6eff:fe44:981a/64 scope link

                                valid_lft forever preferred_lft forever

                3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000

                                link/ether 08:60:6e:44:98:1b brd ff:ff:ff:ff:ff:ff

 

Step(6) Note the new MAC addresses of eth0 and eth1 listed above and exit the storage node.

                               

                root@nfsROOT:~# exit

                logout

                Connection to 10.10.1.247 closed.

               

Step(7) Enter Qshell on the Controller Node.

                root@Controller1:~# qshell

                Welcome to qshell

 

Step(8) If the new machine is not already showing up under the CMC GUI in the FAILED section of Unmanaged Devices, enter the following QShell commands as shown and in the order listed for "In [#]:"

                *** CMC GUI:       Dashboard --> Administration --> Hardware --> Servers --> Unmanaged Devices --> Failed

                In [1]: api=i.config.cloudApiConnection.find('main')

 

                *** Get the "Name" listed for the replacement node from the CMC GUI:

                                Dashboard --> Administration --> Hardware --> Servers --> Unmanaged Devices --> Uninitialized

                In [2]: machineguid=api.machine.find(name='PM-08:60:6E:44:98:1A')['result'][0]

 

                In [3]: machineguid

                Out[4]: '2c2f7c96-92d4-411f-9d62-f0d45624663d'

 

                In [5]: deviceguid=api.machine.list(machineguid=machineguid)['result'][0]['deviceguid']

 

                In [6]: deviceguid

                Out[7]: '03b16686-b18f-401b-845b-05b47df9cbe6'

 

                In [8]: api.device.updateModelProperties(deviceguid, status=str(q.enumerators.devicestatustype.FAILED))

                Out[9]: {'jobguid': None, 'result': '03b16686-b18f-401b-845b-05b47df9cbe6'}

 

                *** Now the machine should appear in the CMC GUI:

                                Dashboard --> Administration --> Hardware --> Servers --> Unmanaged Devices --> Failed

                It should show as FAILED to be able to run Step 9.

 

Step(9) Use the new MAC address for eth0 as shown (found in step 2 and 5) to 'cleanup' and remove the machine from the unmanaged devices list:

                In [10]: q.amplistor.cleanupMachine('08:60:6E:44:98:1A')

                Out[11]: True

                *** Now the machine dissappears from the CMC GUI:

                                Dashboard --> Administration --> Hardware --> Servers --> Unmanaged Devices --> Failed

                *** Now the machine appears with the correct 'Name' in CMC GUI:

                                Dashboard --> Administration --> Hardware --> Servers --> Storage Nodes

 

Step(10) Update the machine object as follows:

                In [12]: cloudapi=i.config.cloudApiConnection.find('main')

 

                In [13]: machine_guid = cloudapi.machine.find(name='Storage4')['result'][0]

                                *** See comment just above Step (10) for the 'Name'

                In [14]: machine = cloudapi.machine.getObject(machine_guid)

 

 

 

Step(11) Use the following commands to update the MAC addresses:

                In [15]: print machine.nics[0].name

                eth0

 

                In [16]: print machine.nics[1].name

                eth1

 

                In [17]: machine.nics[0].hwaddr = '08:60:6E:44:98:1A'

 

                In [18]: machine.nics[1].hwaddr = '08:60:6E:44:98:1B'

 

Step(12) Use the following command to save the new settings:      

                In [19]: q.drp.machine.save(machine)

 

Step(13) Update the device object as follows:

                In [20]: device = cloudapi.device.getObject(machine.deviceguid)

 

                In [21]: device.nicports[0].hwaddr

                Out[22]: '30:85:A9:A5:4E:BE'                 <-----  Should be the old MAC address, before the change is made below.

 

                In [23]: device.nicports[0].hwaddr = '08:60:6e:44:98:1a'

 

                In [24]: q.drp.device.save(device)

               

                In [25]: q.manage.dhcpd.restart()

 

                In [26]: exit()

 

Step(14) Reboot the node!!!!!!!

Step(15) Open a new SSH session from the Controller node and connect to the storage node again.

                *** As a check of your work so far you can run the command:   "ip a"

                root@nfsROOT:~# ip a

                1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

                                link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

                                inet 127.0.0.1/8 scope host lo

                                inet6 ::1/128 scope host

                                valid_lft forever preferred_lft forever

                2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                link/ether 08:60:6e:44:98:1a brd ff:ff:ff:ff:ff:ff

                                inet 10.10.1.247/24 brd 10.10.1.255 scope global eth0

                                inet6 fe80::a60:6eff:fe44:981a/64 scope link

                                valid_lft forever preferred_lft forever

                3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000

                                link/ether 08:60:6e:44:98:1b brd ff:ff:ff:ff:ff:ff

               

Step(16)  Edit the following configuration file on the storage node:

                root@nfsROOT:~# vi /opt/qbase3/cfg/qconfig/main.cfg

 

                Replace the value seen at "nodename" with the new MAC-address of eth0

                Note: The MAC address must entered without colons. Case does not matter.

                (example: 08:60:6e:44:98:1a would be entered as 08606e44981a).

                                               [main]

                                               lastlogcleanup = 1412979001

                                               domain = somewhere.com

                                               nodetype = STORAGENODE

                                               nodename = 08606e44981a

                                               logserver_loglevel = 6

                                               logserver_port = 9998

                                               logserver_ip = 127.0.0.1

                                               qshell_firstrun = False

                                               machineguid = 2c2f7c96-92d4-411f-9d62-f0d45624663d                                                                                 

Step(17) Save and close the configuration file.

 

Step(18) Restart the maintenance agents on the storage node after making the changes to main.cfg.

 

    In [1]: q.dss.maintenanceagents.restart()

 

    In [2]:  exit()

 

 

Step(19)  Verify the ethernet and MAC address settings. View the 70-persistant-net.rules file and then run the lshw –C network command. Compare the data and make sure that the Eth0, Eth1 and MAC address configuration is correct.

 

# cat /etc/udev/rules.d/70-persistant-net.rules

 

# lshw –C network

 

 

Finished

 

Attachments
Title Last Updated Updated By
Replacing the Storage Node Chassis-version3
01/22/2015 04:32 PM Greg Schaefer


This page was generated by the BrainKeeper Enterprise Wiki, © 2018