OpenManage Does Not Start Because Semaphore Count Reached (DRAFT)

 

SR Information: 1608620

 

Product / Software Version: Found on a DXi8500 with 2.2.1 software.

 

Problem Description: OpenManage does not start because semaphore count reached (OMSA won't start).

 

Related PTRs: Bug 35994 Link will open in new window. - Disk timeout causes semaphore leakage
 

 

Overview

This topic describes how to fix an issue where OpenManage is unable to restart due to the semaphore counters.

 

Although it's possible the issue occurs in other DXi platforms, this resolution is applicable for DXi8500. For other platforms, please consult bug 35994. In the SR it is requested that you confirm the workaround for other platforms or escalate the issue to your backline support.

 

What is semaphore?

 

A semaphore is as a counter used to control access to shared resources. Under an OS there will be multiple process that will request access to the shared resources. The semaphore comes into play as a locking mechanism that will prevent processes from accessing a shared resource beyond the limit defined by the counters.

 

When you have a process trying to access a resource that reached the limit defined on the semaphore counters, you'll receive a message in the OS logs (messages logs) indicating that the semaphore counter (set, group, etc) was exceeded.

 


Symptom (How to Identify the Problem)

Sep  9 04:05:16 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded.

 


Root Cause and Additional Symptoms

There could be several reasons when a system exceed the semaphore count. For the SR 1608620, it was found that a flood of disk command timeout events caused the semaphore to exceed the limit. You will find the additional symptoms beyond the symptom description above.

 

First, you'll see dsm_sa* shuting down periodically (expected) but, in the following example, you can see the some shutdown failures starting on Sep 4th:

 

Aug 29 04:02:12 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Aug 30 04:02:06 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Aug 30 04:02:13 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Aug 31 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Aug 31 04:02:12 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  1 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  1 04:02:13 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  2 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  2 04:02:14 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  3 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  3 04:02:14 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  4 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  4 04:03:08 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown failed

Sep  4 04:03:20 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  5 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  5 04:03:08 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown failed

Sep  5 04:03:29 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  6 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  6 04:03:08 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown failed

Sep  6 04:03:21 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  7 04:02:08 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  7 04:03:08 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown failed

Sep  7 04:03:09 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown failed

Sep  8 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  8 04:03:07 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep  9 04:02:10 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep  9 04:03:10 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown failed

Sep  9 04:03:20 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep 10 04:02:07 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep 10 04:03:08 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown failed

Sep 10 04:03:09 NR1PQ8500VTLM01 dataeng: dsm_sa_datamgrd shutdown succeeded

Sep 10 15:27:53 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep 11 04:02:13 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

Sep 11 10:16:36 NR1PQ8500VTLM01 dataeng: dsm_sa_eventmgrd shutdown succeeded

(END)

 

Note that the sempahores issues starts on Sept 9.

 

$ grep semaphore messages | less -NI

      1 Sep  9 04:05:16 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

      2 Sep  9 04:15:04 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

      3 Sep  9 04:24:57 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

      4 Sep  9 04:34:49 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

      5 Sep  9 04:44:54 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

>>> and events continues until Sep 11 when workaround were applied

    220 Sep 11 10:07:02 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

    221 Sep 11 10:07:02 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

    222 Sep 11 10:17:17 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

    223 Sep 11 10:18:08 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

    224 Sep 11 10:24:34 NR1PQ8500VTLM01 Server Administrator (Shared Library): Data Engine EventID: 0  A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded

 

Starting on Sept 4th, disk 16 start to present several command timeout issues (command timeout) and messages is filled by those events on Sep 9th:

 

$ grep 'Storage Service' messages | less -NI

 

    635 Sep  4 02:14:37 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    636 Sep  4 02:15:58 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2095  Unexpected sense. SCSI sense data: Sense key:  6 Sense code: 29 Sense qualifier:  2:  Physical Disk 0:0:16 Controller 1, Connector 0

    637 Sep  4 02:15:58 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    638 Sep  4 02:16:40 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    639 Sep  4 02:16:41 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2346   Error occurred: Error on PD 30(e0x35/s16) (Error f0).:  Physical Disk 0:0:16 Controller 1, Connector 0

    640 Sep  4 02:16:41 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2048  Device failed:  Physical Disk 0:0:16 Controller 1, Connector 0

    641 Sep  4 02:16:41 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2123  Redundancy lost:  Virtual Disk 7 (BPMD_8) Controller 1 (PERC H800 Adapter)

    642 Sep  4 02:16:41 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2057  Virtual disk degraded:  Virtual Disk 7 (BPMD_8) Controller 1 (PERC H800 Adapter)

    643 Sep  4 02:16:42 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    644 Sep  4 02:16:43 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2065  Physical disk Rebuild started:  Physical Disk 0:0:0 Controller 1, Connector 0

    645 Sep  4 02:16:43 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    646 Sep  4 02:29:31 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    647 Sep  4 02:29:31 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    648 Sep  4 02:29:32 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2095  Unexpected sense. SCSI sense data: Sense key:  5 Sense code: 24 Sense qualifier:  0:  Physical Disk 0:0:8 Controller 2, Connector 0

    649 Sep  4 02:29:32 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    650 Sep  4 03:01:43 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    651 Sep  4 04:01:55 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    652 Sep  4 04:04:59 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2048  Device failed:  Physical Disk 0:0:16 Controller 1, Connector 0

    653 Sep  4 04:05:00 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2048  Device failed:  Physical Disk 0:0:16 Controller 1, Connector 0

    654 Sep  4 04:05:00 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2057  Virtual disk degraded:  Virtual Disk 7 (BPMD_8) Controller 1 (PERC H800 Adapter)

    655 Sep  4 04:05:00 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

    656 Sep  4 04:05:00 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

    657 Sep  4 04:05:00 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

    658 Sep  4 04:05:01 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

. . . >>> message repeats

    681 Sep  4 04:05:06 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

    682 Sep  4 04:05:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

    683 Sep  4 04:05:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

    684 Sep  4 04:05:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

    685 Sep  4 04:05:36 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    686 Sep  4 04:06:18 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    687 Sep  4 05:00:12 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    688 Sep  4 05:01:36 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    689 Sep  4 05:22:36 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    690 Sep  4 06:01:48 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    691 Sep  4 07:02:00 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    692 Sep  4 07:13:12 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    693 Sep  4 08:01:30 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    694 Sep  4 09:01:42 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    695 Sep  4 09:32:30 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    696 Sep  4 09:38:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

    697 Sep  4 10:01:55 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

. . .

   1041 Sep  8 23:35:03 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1042 Sep  9 00:01:39 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1043 Sep  9 01:01:53 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1044 Sep  9 01:27:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2181  The controller battery Learn cycle will start in 24 hours.:  Battery 0 Controller 1

   1045 Sep  9 01:27:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1046 Sep  9 02:00:01 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1047 Sep  9 02:01:25 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1048 Sep  9 02:02:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1049 Sep  9 02:30:10 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2181  The controller battery Learn cycle will start in 24 hours.:  Battery 0 Controller 2

   1050 Sep  9 02:30:10 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1051 Sep  9 02:30:51 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1052 Sep  9 03:01:41 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1053 Sep  9 04:01:56 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1054 Sep  9 04:05:06 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2048  Device failed:  Physical Disk 0:0:16 Controller 1, Connector 0

   1055 Sep  9 04:05:06 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2048  Device failed:  Physical Disk 0:0:16 Controller 1, Connector 0

   1056 Sep  9 04:05:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

   1057 Sep  9 04:05:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

   1058 Sep  9 04:05:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

   1059 Sep  9 04:05:07 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

   1060 Sep  9 04:05:08 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

   1061 Sep  9 04:05:08 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

   1062 Sep  9 04:05:08 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 01 dc 01 1d 00:  Controller 1 (PERC H800 Adapter)

   1063 Sep  9 04:05:08 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

. . .

   1084 Sep  9 04:05:14 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: Command timeout on PD 30(e0x35/s16) Path 500000e117037dc2, CDB: 12 00 00 00 60 00:  Controller 1 (PERC H800 Adapter)

   1085 Sep  9 04:05:14 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2335  Controller event log: PD 30(e0x35/s16) Path 500000e117037dc2  reset (Type 03):  Controller 1 (PERC H800 Adapter)

   1086 Sep  9 04:05:44 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1087 Sep  9 04:06:26 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1088 Sep  9 04:15:32 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

. . . >>> message repeats

   1132 Sep  9 09:00:31 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1133 Sep  9 09:01:55 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1134 Sep  9 09:10:19 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1135 Sep  9 09:11:01 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1136 Sep  9 09:30:37 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1137 Sep  9 09:32:01 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

   1138 Sep  9 09:32:01 NR1PQ8500VTLM01 Server Administrator: Storage Service EventID: 2405  Command timeout on physical disk:  Physical Disk 0:0:16 Controller 1, Connector 0

 


Resolution (Workaround)

Note: Because this article describes one specific root cause for the semaphore count issue, make sure to only apply the following resolution steps to the specific issues that are covered in this topic.

 

Before attempting to start OpenManage manually, make sure to solve the semaphore issue first. In this case you have two options:

 

Option 1: Reboot the DXi

 

This action will release all the shared resources and zero the counter.

 

Note that this option will work in most scenarios where you have exceeded the semaphore count, unless you are facing a condition where the root cause must to be addressed first prior to the reboot.

 

SR1608620 is a good example where reboot may work initially, but while the bad disk still on the DXi is generating command timeouts, the customer may face the issue again. In this situation, you should use the resolution steps outlined in Option 2 below.

 

Option 2: Increase the semaphore counter

 

Note that this resolution was applied for a DXi8500. For other platforms, please consult PTR 35994, where a request for semaphore count settings was filled for other platforms. If the PTR is not updated for your platform, please escalate to your backline team.

 

  1. First save the counter information. You can collect the information using 'cat' as shown bellow:

# cat /proc/sys/kernel/sem

250     32000   32      128

 

  1. To add the counter, execute the following command:

echo 500 32000 256 256 > /proc/sys/kernel/sem

 

Note: those changes are temporary (reboot the machine will bring the old settings back)

 

  1. Try to bring the OpenManage service back :

Make sure all modules of OpenManger are stopped by executing a stop:

 

sh /opt/dell/srvadmin/sbin/srvadmin-services.sh  stop

 

Now start OpenManager:

 

sh /opt/dell/srvadmin/sbin/srvadmin-services.sh start

 

With Option 2:

 


Additional Information

OpenManage is a Dell product. If you encounter an issue different from the one described in this topic, open a case with Dell. If you require assistance from your backline team, please make sure you gather additional data before you escalate:

 

Note: There are some bugs reporting core cases with OpenManage and semaphore issues. Check to see if your case matches any existing bugs.

 

In order to collect the PID that is allocating each semaphore, get the semaphore ID from the output of the command above and execute the following command according to the given exemple where we are collecting the information of which process is using semaphore id 1651343360

 

# ipcs -a

 

------ Shared Memory Segments --------

key        shmid      owner      perms      bytes      nattch     status

0x00005056 0          root      666        1504       1

0x00005059 32769      root      666        6170192    2

0x79067eb8 65538      root      666        808        81

0x07021999 98307      root      644        1792       2

 

------ Semaphore Arrays --------

key        semid      owner      perms      nsems

0x00000000 1651343360 root      600        1

0x00000000 1651965953 root      600        1

0x00000000 1651998722 root      600        1

0x00000000 1652260867 root      600        1

0x00000000 1652228100 root      600        1

0x00000000 1652097029 root      600        1

0x00000000 1652129798 root      600        1

0x00000000 1652162567 root      600        1

0x00000000 1652195336 root      600        1

0x00000000 1652293641 root      600        1

0x00000000 1652326410 root      600        1

0x00000000 1652359179 root      600        1

0x00000000 1652391948 root      600        1

0x00000000 1652424717 root      600        1

0x00000000 1652457486 root      600        1

0x00000000 1652490255 root      600        1

0x00000000 1652523024 root      600        1

0x00000000 1652686865 root      600        1

0x00000000 1652588562 root      600        1

0x00000000 1652621331 root      600        1

0x00000000 1652654100 root      666        1

0x00000000 1652719637 root      600        1

0x79067eb8 28409878   root      666        1

0x00000000 47677463   nobody    600        1

0x00000000 47710232   nobody    600        1

0x00000000 47743001   nobody    600        1

0x00000000 47775770   nobody    600        1

0x00000000 47808539   nobody    600        1

0x00000000 47841308   nobody    600        1

0x00000000 47874077   nobody    600        1

 

------ Message Queues --------

key        msqid      owner      perms      used-bytes   messages

 

# ipcs -s -i 1651343360

 

Semaphore Array semid=1651343360

uid=0    gid=0   cuid=0  cgid=0

mode=0600, access_perms=0600

nsems = 1

otime = Not set

ctime = Thu Sep 19 04:02:18 2013

semnum     value      ncount     zcount     pid

0          0          1          0          5374

 

# ps -ef | grep 5374

root      5374     1  0 04:02 ?        00:00:21 /opt/dell/srvadmin/sbin/dsm_sa_datamgrd

root     25699  9234  0 23:15 pts/2    00:00:00 grep 5374

 

Or you can use the following command instead of ps (easy for who wants to write a quick script):

 

# more /proc/5374/cmdline

/opt/dell/srvadmin/sbin/dsm_sa_datamgrd



This page was generated by the BrainKeeper Enterprise Wiki, © 2018