How to Handle Kernel dmesg Events |
This topic describes how to address some of the kernel dmesg events that Technical Support sees in Lattus systems. A kernel dmesg event is a generic event that has numerous causes and varied required actions. Below are of some root causes and recommended actions.
Details:
EXT4-fs (sdc2): error count since last fsck: 1
EXT4-fs (sdc2): initial error at time 1492861768: __ext4_get_inode_loc:3955: inode 153899646: block 615514803
EXT4-fs (sdc2): last error at time 1492861768: __ext4_get_inode_loc:3955: inode 153899646: block 615514803
Actions:
# lsof /dev/sdc2 | grep dss.bin | awk '{print $2}'
2166
# ps -ef | grep 2166 | grep -v grep | awk -F "/" '{print $11}')| awk -F "." '{print $1}'
a80279ba-c969-483a-981f-704baa76f644
# qshell -c "q.dss.storagedaemons.restartOne('a80279ba-c969-483a-981f-704baa76f644')"
#df -h | grep sdc2
/dev/sdc2 2.7T 2.1T 574G 79% /mnt/dss/dss3
# umount /dev/sdc2
# fsck -Ma /dev/sdc2
fsck from util-linux 2.20.1
dss3 contains a file system with errors, check forced.
dss3: 480825/179871744 files (82.6% non-contiguous), 561977731/719458929 blocks
# fsck -Ma /dev/sdc2;
fsck from util-linux 2.20.1
dss3: clean, 480825/179871744 files, 561977731/719458929 blocks
# mount /dev/sdc2
# df -h | grep sdc
/dev/sdc2 2.7T 2.1T 574G 79% /mnt/dss/dss3
Note: Use the command with great CAUTION! There are changes in multiple places that need to be made. This can be done with multiple disks (or single) by modifying the following line.
# for n in a b; do (for x in $(for i in $(lsof /dev/sd${n}1 | grep dss.bin | awk '{print $2}'); do (ps -ef | grep $i | grep -v grep | awk -F "/" '{print $11}')| awk -F "." '{print $1}'; done); do (qshell -c "q.dss.storagedaemons.restartOne('$x')"); done; df -h | grep sd${n}; umount /dev/sd${n}1; fsck -Ma /dev/sd${n}1; fsck -Ma /dev/sd${n}1; mount /dev/sd${n}1; df -h | grep sd${n}); done
Details:
EXT4-fs (sdg1): error count: 2
EXT4-fs (sdg1): initial error at 1447102452: ext4_journal_check_start:56
EXT4-fs (sdg1): last error at 1447102452: ext4_journal_check_start:56
Actions:
Complete the Error count since last fsck - Run fsck on the identified filesystem procedure.
Details:
res 41/40:00:98:9b:8b/00:00:96:01:00/40 Emask 0x409 (media error) <F>
ata1.00: error: { UNC }
Sense Key : Medium Error [current] [descriptor]
Add. Sense: Unrecovered read error - auto reallocate failed
res 41/40:00:08:9c:8b/00:00:96:01:00/40 Emask 0x409 (media error) <F>
ata1.00: error: { UNC }
Sense Key : Medium Error [current] [descriptor]
Add. Sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 6820699144
Actions:
No action required.
Details:
Call Trace:
Call Trace:
Call Trace:
Actions:
No action required.
Details:
ata9.00: irq_stat 0x08000008, interface fatal error
ata9: SError: { 10B8B Dispar BadCRC }
res 40/00:d4:e0:21:5b/00:00:c8:00:00/40 Emask 0x10 (ATA bus error)
res 40/00:d4:e0:21:5b/00:00:c8:00:00/40 Emask 0x10 (ATA bus error)
machinename:storage111
Actions:
# ls -l /sys/block/ | grep ata9
lrwxrwxrwx 1 root root 0 Apr 19 13:13 sdh -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/ata9/host8/target8:0:0/8:0:0:0/block/sdh
# qshell
In [1]: ca=i.config.cloudApiConnection.find('main
In [2]: mguid=ca.machine.find('storage111')['result'][0]
In [3]: dguid=ca.disk.find(machineguid=mguid, name='sdh')['result'][0]
In [4]: ca.disk.updateModelProperties(dguid, status=str(q.enumerators.diskstatustype.DEGRADED))
Out[4]: {'jobguid': None, 'result': '5bf2162e-a44c-4fa0-b534-06de92d29a25'}
Details:
{8}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{8}[Hardware Error]: It has been corrected by h/w and requires no further action
{8}[Hardware Error]: event severity: corrected
{8}[Hardware Error]: Error 0, type: corrected
{8}[Hardware Error]: fru_text: CorrectedErr
{8}[Hardware Error]: section_type: memory error
[Firmware Warn]: error section length is too small
Action:
No action required
Details:
EXT4-fs error (device sdc2): ext4_lookup:1430: inode #153755541: comm dss.bin: deleted inode referenced: 153755715
machinename:storage120
Actions:
The disk should be listed in the Degraded Disks area of CMC. Decommission the disk and replace per usual procedures.
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |