FYI: Stornext Appliances: Failed IPMI iDRAC Symptoms and Recovery |
Thanks to Jeff Syme/SPS for this Info:
See attachments for full article.
Team,
I know we all know this, but here is a refresher for those of us that are a little less familiar <& future notes for myself>.
Our M440’s are comprised of the Dell PowerEdge R520 servers. Of course they have our Quantum branding on them, but everyone knows what they are.
(Inside this Acura there’s still a Honda underneath, which is a fine automobile. Moving on... )
IPMI hardware can fail, but contrary to popular lore, its’ _NOT_ a daughter card that can be replaced as a CRU/FRU. It’s literally soldered into the motherboard/system board. iDRAC
https://www.youtube.com/watch?v=DFovxsbqspI
You can perform a reset of your iDRAC without having to power down your server if you receive the following error and SSH doesn't work for you.
"RAC0218: The maximum number of user sessions is reached."
This is a known- issues for iDRAC firmware 1.50.50, 1.51.51, 1.51.52, 1.55.55 & 1.56.55.
Remember to update your firmware once you have access to your iDRAC again. <upgrade within the StorNext firmware>
One can acquire a DSET from the primary MDC from the GUI and contained in the snapshot. GUI – Service –Capture DSET ~ 4 MB
The secondary MDC DSET must be acquired from the MDC via the service.sh script via Linux/ssh
Looking at the MDC snapshot \usr\adic\tmp\platform\hw-info\current.config all of the IPMUI values are blank.
The remediation is to reload the IPMI driver so that future snapshot will hook the correct information.
### -ipmi-
IP Address:
MAC Address:
Gateway IP:
Gateway MAC:
Board Mfg:
Board Product:
Board Number:
Product Version:
Manufacturer ID:
Product ID:
Device Revision:
Hardware Revision:
Firmware Version:
Firmware Build Number: 0
SEL Percent Used:
Completed Transactions:
Remediation:
This is a race condition on boot and the driver fails to load.
The workaround for this issue is to restart Dell services. This Bug is fixed in 5.4.0.x
The remediation is to reload the IPMI driver so that future snapshot will hook the correct information.
-or- this is evidence that the IPMI really has failed.
Looking at the MDC snapshot \usr\adic\tmp\platform\hw-info\current.config all of the ipmi values are populated.
This is what it looks like when it’s working:
MDC1 current.config in the ### -ipmi- section
### -ipmi-
IP Address: 10.17.21.51
MAC Address: 90:b1:1c:2e:76:7f
Gateway IP: 10.17.21.254
Gateway MAC: 00:00:00:00:00:00
Board Mfg: Sat Dec 15 05:54:00 2012
Board Product: PowerEdge R520
Board Number: 0DFFT5A06
Product Version: 01
Manufacturer ID: 674
Product ID: 256 (0x0100)
Device Revision: 1
Hardware Revision: 32
Firmware Version: 2.0a
Firmware Build Number: 310A0A
SEL Percent Used: 0%
Completed Transactions: 2507
Attempt to software reset the IPMI / iDRAC: <thanks to Oliver Lemke>
IPMI in the service.sh script # sh /opt/DXi/scripts/service.sh
Here are the results when the IPMI is dead: <or the services are not started, I haven’t fleshed this out yet>
[root@node-q2 stornext]# ipmitool bmc info
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
[root@node-q2 stornext]# ipmitool lan print 1
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
[root@node-q2 stornext]# ipmitool bmc reset cold
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Results of DESET
from a working IPMI / iDRAC <left> and failed IPMI / iDRAC <right> where you’ll notice all environmental details are missing from MDC2 DSET right
<Screens capture>
The IPMI controls the LCD on the front panel. 2 options:
If the customer is willing to reboot a MDC: /// Compliments of Danny Barbour on or about Nov 29. 2016:
1. Boot up with a laptop plugged into the iDRAC port
2. Press F2 on bootup
3. Go to iDRAC Settings -> Network -> NIC Selection -> Make sure it's set to 'Dedicated' -> Note the IP address listed
4. Set the laptop IP to 1 above or below that IP from Step 3.
5. Ping the IP from Step 3 from your laptop
6. Open a web browser
7. Go to the IP address in Step 3.
If the iDRAC IP is pingable in step 5, and the GUI comes up in Step 7, then iDRAC is working fine.
He also mentioned that if the iDRAC is bad, the entire motherboard would need to be replaced because it's onboard
Bug 63498 - iDRAC FW bug causes permanent iDRAC failure on R630 Motherboards after updating from 2.15.10.10 to 2.21.21.21
Since our license is based on Eth2 MAC address (a PCIe card), the StorNext license.dat does not have to change with a system board replacement.
# Product: maintenance
# System: Not_provided
# Company: <company here>
# Serial Number: CX123xxxxxxxxx
# Identifier: 782BCB646191
# Expiration Date: 2016 11/30
# License: AAAHS/L2AAA/EADU5/ENGJS/CN7CP/KS4LM/5ZLLH/6SDNU/5B9BK/C7TNA/LS
# Authorization String:
maintenance 1 782BCB646191 0 AAAHSL2AAAEADU5ENGJSCN7CPKS4LM5ZLLH6SDNU5B9BKC7TNALS <xxxxxxx>
DRAC - Dell Remote Access Controller
Dell Chassis Management Controller (CMC) – Blade servers each have an iDRAC and are managed as a group in CMC. Not to be confused with Lattus CMC
IPMI - The Intelligent Platform Management Interface (IPMI) is a set of computer interface specifications for an autonomous computer subsystem that provides management and monitoring capabilities independently of the host system's CPU, firmware (BIOS or UEFI) and operating system.
HP’s iLO and RiLO Integrated Lights-Out (iLO) and Remote Integrated Lights-Out (RiLO)
generic BMC - baseboard management controller (BMC)
Lenovo's ThinkServer EasyManage
What version are you running?
GUI - IPMI - Help – about Integrated Remote Access Controller 6 - Enterprise 1.92
# dmidecode -s bios-version
6.1.0
CSweb - > StorNext Metadata Appliances -> Manuals -> StorNext M-Series Service Plan ->
I have found that emailing the DSET as an attachment is ALWAYS stripped when sending to DELL, so I just create an outgoing FTP folder for DELL to pull it themselves.
What Dell recommends to do to reset the iDRAC when the LCD is dark:
Ensure that the BIOS and iDRAC firmware is up to date and attempt each of the following steps.
1. A soft reset of the iDRAC with the racadm racreset command and allow a few minutes for iDRAC to boot and if no change.....
2. Drain the system flea power by shutting down the server and removing power cables. Press and hold the power button for 15 seconds and reboot.
3. Update the IDRAC firmware. Select appropriate link below:
4. Reseat the LCD cables
When re-seating the control panel, ensure the ribbon cable is re-seated on both ends i.e at the control panel and motherboard connections.
Because the LCD runs on AUX power, the power cable must to be pulled as well to allow the LCD to re-initialize after re-seat.
http://en.community.dell.com/support-forums/servers/f/956/t/19985288
http://en.community.dell.com/support-forums/servers/f/956/p/19627363/20753703?rfsh=1468248797399
https://www.youtube.com/watch?v=1yr8Tn-HOdk
Dell PowerEdge R630 / Xcellis Look for this document: iDRAC8-with-lc-v2.05.05.05_User-Guide.pdf ~9.3MB
Dell PowerEdge R520 / M440 integrated-dell-remote-access-cntrllr-7-v1.50.50_User's Guide_en-us.pdf ~7.3MB
Attachments |
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |