Hardware Issues

Overview

This article covers the high-level flow for resolving hardware failures.

 

Step 1: ASPS Triage

The ASPS team researches RAS events, messages file, and tsunami log to determine the failing part. Then they dispatch a QFE to replace the part.

Step 2: Relace Failing Part
  1. The QFE replaces the failing part as determined by ASPS.
  2. The QFE does one of the folowing:
    • If the replacement fixes the problem, you are done.
    • If the replacement does not fix the problem, follow the steps described in the Next Steps section below.
Next Steps

Review RAS events, messages file, and tsunami log again to check the symptoms. 

 

Failed Components

If you are looking for failed components, go to the hw-info/CoR directory in collect and check for ‘Failed’ components in the various .xml files. This is where the GUI gets its status.
 

Status after a Running a Hardware Detect

If you are looking for a changed status after running a hardware detect, components, go to the the hw-info directory in the collect and check the changes.diff, current.config. and factory.config for status.

 

Procedures that have been Run by Field Engineer

If you want to verify that to verify that the correct procedures have been run by the QFE, go to the app-info directory and check the service.log for any service.sh activities.

 

RAID Issues on an LSI RAID Array

If you are looking for RAID issues on an LSI RAID array, check the LSI logs; in particular, the MajorEventLog.

 

RAID Issues on a 3Ware RAID Array

If you are looking for RAID issues on a 3ware RAID array, check the 3ware collect.

 

Dell Hardware Issues

If you are looking for issues on Dell hardware, LSI RAID array, check the DSET logs from the Dell server.

 

 

 



This page was generated by the BrainKeeper Enterprise Wiki, © 2018