FYI: Stornext Appliances: Failed IPMI iDRAC Symptoms and Recovery

 Thanks to Jeff Syme/SPS for this Info:

See attachments for full article.

 

 

Team,

 

I know we all know this, but here is a refresher for those of us that are a little less familiar <& future notes for myself>.

 

 

The short answer:

  1. # ssh to the MDCx and issue # racadm racreset       and wait for the iDRAC to reboot <this does not affect the Server>
  2. Hold down the  “locate server button” for longer than 15 seconds to reset the iDRAC <this does not affect the Server> observe LCD
  3. Collect a DSET and server asset number to pass on to Dell support, create the Oracle Task and call Dell 866.292.3355 pin #11888

 

The long answer:

Our M440’s are comprised of the Dell PowerEdge R520 servers. Of course they have our Quantum branding on them, but everyone knows what they are.

(Inside this Acura there’s still a Honda underneath, which is a fine automobile. Moving on... )

 

IPMI hardware can fail, but contrary to popular lore, its’ _NOT_ a daughter card that can be replaced as a CRU/FRU. It’s literally soldered into the motherboard/system board. iDRAC

 

Quick Facts:

 

Notification:

Failure footprint:

 

Hard reset Dell iDRAC 7 - Dell R620 & R720 - fix for error RAC0218

https://www.youtube.com/watch?v=DFovxsbqspI

You can perform a reset of your iDRAC without having to power down your server if you receive the following error and SSH doesn't work for you.

"RAC0218: The maximum number of user sessions is reached."

This is a known- issues for iDRAC firmware 1.50.50, 1.51.51, 1.51.52, 1.55.55 & 1.56.55.

Remember to update your firmware once you have access to your iDRAC again. <upgrade within the StorNext firmware>

 

Where to get logs:

One can acquire a DSET from the primary MDC from the GUI and contained in the snapshot. GUI – Service –Capture DSET ~ 4 MB

 

 

The secondary MDC DSET must be acquired from the MDC via the service.sh script via Linux/ssh

Log review:

Looking at the MDC snapshot  \usr\adic\tmp\platform\hw-info\current.config    all of the IPMUI values are blank. 

The remediation is to reload the IPMI driver so that future snapshot will hook the correct information. 

### -ipmi-

    IP Address:

    MAC Address:

    Gateway IP:

    Gateway MAC:

    Board Mfg:

    Board Product:

    Board Number:

    Product Version:

    Manufacturer ID:

    Product ID:

    Device Revision:

    Hardware Revision:

    Firmware Version:

    Firmware Build Number: 0

    SEL Percent Used:

    Completed Transactions:

 

Remediation:

This is a race condition on boot and the driver fails to load.

The workaround for this issue is to restart Dell services.  This Bug is fixed in 5.4.0.x

The remediation is to reload the IPMI driver so that future snapshot will hook the correct information.

-or- this is evidence that the IPMI really has failed.

 

 

Looking at the MDC snapshot  \usr\adic\tmp\platform\hw-info\current.config    all of the ipmi values are populated.

This is what it looks like when it’s working:

MDC1 current.config in the ### -ipmi- section

### -ipmi-

    IP Address: 10.17.21.51

    MAC Address: 90:b1:1c:2e:76:7f

    Gateway IP: 10.17.21.254

    Gateway MAC: 00:00:00:00:00:00

    Board Mfg: Sat Dec 15 05:54:00 2012

    Board Product: PowerEdge R520

    Board Number: 0DFFT5A06

    Product Version: 01

    Manufacturer ID: 674

    Product ID: 256 (0x0100)

    Device Revision: 1

    Hardware Revision: 32

    Firmware Version: 2.0a

    Firmware Build Number: 310A0A

    SEL Percent Used: 0%

    Completed Transactions: 2507

 

Attempt to software reset the IPMI / iDRAC:  <thanks to Oliver Lemke>

 

 

Software:

 

IPMI in the service.sh script  # sh /opt/DXi/scripts/service.sh

Here are the results when the IPMI is dead: <or the services are not started, I haven’t fleshed this out yet>

 

[root@node-q2 stornext]# ipmitool bmc info

Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

 

[root@node-q2 stornext]# ipmitool lan print 1

Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

 

[root@node-q2 stornext]# ipmitool bmc reset cold

Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

 

 

Results of DESET

from a working IPMI / iDRAC <left> and failed IPMI / iDRAC <right> where you’ll notice all environmental details are missing from MDC2 DSET right

 

<Screens capture>

 

 

 

 

 

 

Hardware:

The IPMI controls the LCD on the front panel.  2 options:

 

 

 

 

 

 

 

 

If the customer is willing to reboot a MDC: /// Compliments of Danny Barbour on or about Nov 29. 2016:

1.       Boot up with a laptop plugged into the iDRAC port

2.       Press F2 on bootup

3.       Go to iDRAC Settings -> Network -> NIC Selection -> Make sure it's set to 'Dedicated' -> Note the IP address listed

4.       Set the laptop IP to 1 above or below that IP from Step 3.

5.       Ping the IP from Step 3 from your laptop

6.       Open a web browser

7.       Go to the IP address in Step 3.

If the iDRAC IP is pingable in step 5, and the GUI comes up in Step 7, then iDRAC is working fine.

He also mentioned that if the iDRAC is bad, the entire motherboard would need to be replaced because it's onboard

Firmware:

Bug 63498 - iDRAC FW bug causes permanent iDRAC failure on R630 Motherboards after updating from 2.15.10.10 to 2.21.21.21

Licenses:

Since our license is based on Eth2 MAC address (a PCIe card), the StorNext license.dat does not have to change with a system board replacement.

 

# Product:            maintenance

# System:             Not_provided

# Company:            <company here>

# Serial Number:      CX123xxxxxxxxx

# Identifier:         782BCB646191

# Expiration Date:    2016 11/30

# License:  AAAHS/L2AAA/EADU5/ENGJS/CN7CP/KS4LM/5ZLLH/6SDNU/5B9BK/C7TNA/LS

# Authorization String:

maintenance 1 782BCB646191 0 AAAHSL2AAAEADU5ENGJSCN7CPKS4LM5ZLLH6SDNU5B9BKC7TNALS <xxxxxxx>

 

 

Terms defined:

 

DRAC - Dell Remote Access Controller

 

Dell Chassis Management Controller (CMC) – Blade servers each have an iDRAC and are managed as a group in CMC. Not to be confused with Lattus CMC

 

IPMI - The Intelligent Platform Management Interface (IPMI) is a set of computer interface specifications for an autonomous computer subsystem that provides management and monitoring capabilities independently of the host system's CPU, firmware (BIOS or UEFI) and operating system.

 

HP’s iLO and RiLO Integrated Lights-Out (iLO) and Remote Integrated Lights-Out (RiLO)

 

generic BMC - baseboard management controller (BMC)

Lenovo's ThinkServer EasyManage

 

 

 

 

What version are you running?

GUI - IPMI - Help – about Integrated Remote Access Controller 6 - Enterprise       1.92

# dmidecode -s bios-version

6.1.0

 

Engaging Dell

CSweb - > StorNext Metadata Appliances -> Manuals -> StorNext M-Series Service Plan ->

 

I have found that emailing the DSET as an attachment is ALWAYS stripped when sending to DELL, so I just create an outgoing FTP folder for DELL to pull it themselves.

 

 

 

What Dell recommends to do to reset the iDRAC when the LCD is dark:

Ensure that the BIOS and iDRAC firmware is up to date and attempt each of the following steps.

 

1. A soft reset of the iDRAC with the racadm racreset command and allow a few minutes for iDRAC to boot and if no change.....

2. Drain the system flea power by shutting down the server and removing power cables. Press and hold the power button for 15 seconds and reboot.

3. Update the IDRAC firmware. Select appropriate link below:

4. Reseat the LCD cables

When re-seating the control panel, ensure the ribbon cable is re-seated on both ends i.e at the control panel and motherboard connections.

Because the LCD runs on AUX power, the power cable must to be pulled as well to allow the LCD to re-initialize after re-seat.

 

http://en.community.dell.com/support-forums/servers/f/956/t/19985288

http://en.community.dell.com/support-forums/servers/f/956/p/19627363/20753703?rfsh=1468248797399

https://www.youtube.com/watch?v=1yr8Tn-HOdk

 

 

Misc

Dell PowerEdge R630 / Xcellis Look for this document: iDRAC8-with-lc-v2.05.05.05_User-Guide.pdf  ~9.3MB

 

 

Dell PowerEdge R520 / M440  integrated-dell-remote-access-cntrllr-7-v1.50.50_User's Guide_en-us.pdf ~7.3MB

 

 

https://sourceforge.net/projects/ipmitool/

Attachments
Title Last Updated Updated By
Failed-IPMI-iDRAC-Symptoms-Recovery.pdf
08/01/2017 02:13 PM Mamoon Ansari


This page was generated by the BrainKeeper Enterprise Wiki, © 2018