Gather Data on a Core Dump

After collecting and transferring a core dump file to Quantum, you then need to gather as much data as possible about the core dump. Once you have the data you will use this to in order to perform an analysis of the core.

If you have not retrieved the system collect logs, you will need to collect them at this point (See "Step 2: Collect Support Data" in the DXi-Series Status and Log Collection job aid for instructions).

Complete the following steps to gather the necessary core dump information to analyze the core:

1. The first step is to gather the core backtrace. Check the system collect log as it may already contain the backtrace output. The core backtrace would be located in the following location in the collect log:

\scratch\collect\node1-collection\app-info\core_backtrace.out

View an example of a core_backtrace file

that contains a valid backtrace output (located in the /scratch/collect/node1-collection/app-info directory of the collect). This example has the output of two separate smbd core backtraces:

smbd-1313553069-12562-reported

smbd-1315852091-27766-reported

If the core_backtrace file has information similar to the following for the core you are investigating, this means there isn't a valid backtrace and you will need to gather a backtrace using gdb.

/scratch/core/generatesystemc-1316004544-21790": not in executable format: File format not recognized
/tmp/gdb_all.cmd:1: Error in sourced command file:
No registers.

Note: The core_backtrace file will have entries for all the cores so you need to find the output for the core file name of the core you are investigating.

2. If the core_backtrace file does not contain a backtrace output for the core you are investigating, you need to use gdb to gather the backtrace output for analysis and possible escalation to the SES group. The output from the following steps will provide key information on the core dump that will be needed for investigation (collect the output of your investigation from the beginning):

Caution: Using gdb to gather the backtrace on a live customers DXi can use up the available memory and may cause the DXi to have problems and possibly crash.

Note: You should turn on logging in your ssh session (PUTTY, etc.) to gather the output of the following commands.

3. If you are gathering the backtrace on the live system go to step 9, otherwise copy the zipped core file, cored_daemon file and md5sum files that were gathered in the previous section 'Collect and Transfer a Core Dump File' to the /scratch/qtmtmp directory on the DXi you are going to use to gather the backtrace.

4. In the command line of the DXi you are using to gather the backtrace change directory into the directory of the core file.

# cd /scratch/qtmtmp

5. Verify the m5sum of the <nameofcorefile-reported>.gz is the same as the md5sum file, if the md5sum is not the same the file is invalid.

# cat md5sum-<nameofcorefile-reported>.txt

# md5sum <nameofcorefile-reported>.gz

Verify the m5sum of the <cored_daemon>.gz is the same as the md5sum file, if the md5sum is not the same the file is invalid.

# cat md5sum-<cored_daemon>.txt

# md5sum <cored_daemon>.gz

Unzip the core file.

# gunzip <nameofcorefile-reported>.gz

Unzip the <cored_daemon> file.

# gunzip <cored_daemon>.gz

Collect the file information on the core file.

# file <nameof corefile-reported>

Example:

# file bpwd-1341545262-gcore.4874-reported
bpwd-1341545262-gcore.4874-reported: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'bpwd'

If the cored_daemon wasn't downloaded in the previous 'Collect and Transfer a Core Dump File' and the backtrace is being run on the live DXi or this is being run on a like system with the same firmware as the DXi that cored you will need to locate the daemon that cored. This can give you many files and daemons, but usually the ones needed are in the /opt/DXi/ or /etc/init.d/ directories. The output from this command will give you the path of the daemon that cored. (The cored_daemon will be the process that cored i.e. winbindd, replicationd, ostd, etc.)

Note: Locate should not be used and may be installed on the DXi, running updatedb can cause heavy load and disturb operations. locate and updatedb has been removed and will be removed in future releases.

# find / -name <cored_daemon>

Example:

# find / -name ostd

/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/CORE/nostdio.h

/usr/share/mysql/charsets/geostd8.xml

/var/lock/subsys/ostd

/opt/DXi/adic/perl/lib/5.8.3/x86_64-linux/CORE/nostdio.h

/opt/DXi/adic/TSM/util/fsechostderr

/opt/DXi/ostd

11. Collect the file information on the cored daemon. If the output indicates that it is "stripped" that means we do not have the source code and we may not get a good backtrace. The example provided show as "not stripped". (If the output doesn't look similar the example you may not have the correct file and will need to try the other files in the previous results)

# file <cored_daemon_path>

Example:

# file /opt/DXi/ostd

/opt/DXi/ostd: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped

12. Run the gdb on the core file with the following command:

gdb <cored_daemon_path> < core>

Examples:

o Example: gdb /opt/DXi/bpw/lib/bpwd /scratch/core/bpwd-1285348190-15411-reported

o Example: gdb /opt/DXi/ostd /scratch/core/ostd-1281541365-11418-reported

13. Optional - At the gdb prompt, enter the following “set logging on”. This will generate gdb.txt in your current directory. (This will not log the commands typed and will only log the output. You will need to collect this gdb.txt file from the DXi after completed.)

14. Run the following command: (This will show all the threads that were open at the time of the core).

(gdb) info threads

Press Enter until you see the gdb prompt again.

15. While in gdb, type the following command to generate the backtrace and to collect all data output: (This will give you the backtrace on the thread that was open at the time of the core. It will have a "*" in front of it in the previous "info threads" output.)

(gdb) bt

Press Enter until you see the gdb prompt again.

16. Run the following command:

(gdb) thread apply all where

17. Press Enter until you see the gdb prompt again.

Alternatively: You can change the thread by using t and the thread number, change the frame by using f and the frame number. You can do a full backtrace which gives information from all processes and adds in logging from the messages file by using bt full.

(gdb) t 2

(gdb) f 2

(gdb) bt full

18. Exit gdb:

(gdb) quit

19. Use the output along with the other logs (collect, storage, DSET, etc.) to investigate the issue to see if it is a known issue. If you find an exact match in a PTR on bugzilla and there is a workaround then you will need to apply the work around.

20. Use the output of your ssh session or gdb.txt for the next steps to escalate to SES or add the notes to a PTR.

What's Next?

Do one of the following:

If you do NOT need to escalate the Core Dump but need to attach information to a PTR, continue with How to Add the Instance to Bugzilla >

If you need to escalate the Core Dump to SES, continue with Information Needed to Escalate a Core Dump to SES >

In Step 5

Frequently SES/SUS wants a copy of the executable. So before I gzip the core, I find it useful to copy the executable into the working directory (/scratch/qtmtmp, or /scratch/SRxxxxxxx).

At this point, in /scratch/qtmtmp, or /scratch/SRxxxxxxx, there is a copy of the executable, and the core file. SES/SUS often needs both gdb and bt (backtrace)

The following steps will generate the gdb text file in the local directory

   * gdb executable core- file       (ie. gdb ostd core-file)
   * set logging file gdb_core-file.txt
   * set logging on
   * set pagination off
   * thread apply all where

*quit

The following steps will generate the backtrace (bt) text file in the local directory

   * gdb executable core- file       (ie. gdb ostd core-file)
   * set logging file bt_core-file.txt
   * set logging on
   * set pagination off
   * bt

*quit

At this point, there should be 4 files in your working directory (/scratch/qtmtmp, or /scratch/SRxxxxxxx).

* the core-file

* the executable

* gdb_core-file.txt

* bt_core-file.txt

Since gzip does not compress directories, I tent to use the same commands specified in Quantum Kdump procedures (tar - jcvf) to compress everything at once. To do this, move up a directory and run tar, and then md5sum

* cd ../

* tar -jcvf qtmtmp.tar.bz2 qtmtmp

* md5sum qtmtmp.tar.bz2 > md5sum_qtmtmp.tar.bz2.txt

* ftp gps.quantum.com (anonymous, cd incoming/SRxxxxxxx, bin, hash, put qtmtmp.tar.bz2, put md5sum_qtmtmp.tar.bz2.txt, bye

If you already have an FTP session open to gps, you can verify the files have been uploaded, Then it's time to clean up the DXi:

* cd /scratch

* rm -r /scaratch/*qtmtmp*

Note by Steve Sayler on 06/25/2012 12:27 PM

Note on step 4.

On DXi 1x systems, locate will not always work because the update database has not been updated since the DXi was installed.

* you can maually run 'updatedb - -prunepaths=/Q /snfs /mnt/

* find /opt -name <enter_the_name_of_the_executabel> -print

Note by Steve Sayler on 06/25/2012 11:54 AM

Sometimes it makes sense to gather data on the core file before gzipping it and transferring it to Quantum.

* finding the executable and running gdb & bt only takes a few minutes

* often SES/SUS will need the executable for analysis

* gziping the executable, gdb, bt and core file is as simple as gzipping the core itself

The steps included in this section could be arranged as an option to the "Collect and Transfer a Core Dump File" section.

Note by Steve Sayler on 06/25/2012 11:38 AM