Basics of OST Troubleshooting (DRAFT)

Overview

 Note: Always Check the NBU logs and the tsunami and OST plugin logs for WARN/ERROR messages.


NBU Logs


Windows: %install_path%\NetBackup\logs\
Linux: /usr/openv/netbackup/logs/

 

JE: What is couplet logs directory?


Use Logs to See OST Activities on the DXi

Most logs for OST activities on the DXi are written to tsunami.log. Look for:

If you want to see more informaiton about what is going on, you can activate DEBUG logging on the Dxi for the OST. (See below.)

 

 Note:  Always put the default logging back in place after you have collected the logs you need. Leaving DEBUG enabled will make the logs roll over very quikly, and you will lose important information that has been logged.

 

Look at the following two files and compare with the DXi GUI:

 

They should match up with the GUI.

 

If the customer complains about being unable to delete a LSU, it is probable that the LSU still keeps some images on disk, or the disk is empty, but the configuration files did not get updated.


Make sure that the FS is clean. You can then modify these files to overcome the issue.


For more information about logs, refer to the DXi Log Reading Basics pages.
 


Enable Debugging OST

 Note: 

 

On The DXi System


File to be changed: /hurricane/log-common.conf

Change to make: Change INFO to DEBUG for the first line shown below.

 

log4cplus.logger.ost = INFO, ALL_APP
log4cplus.additivity.ost = false

# modification: change INFO to DEBUG

 

Commands to be run from the CLI, and one comment:

 

Quick method to restart OST (Doesn't always work properly) JE

$ rm /var/DXi/processwatcher
$ /etc/init.d/ostd restart     JE: /etc/init.d/ost restart or /etc/rc.d/init.d/ost restart

# once ostd is backonline, do the following: 

$ touch /var/DXi/processwatcher

 

Full method to restart OST  JE

1.       Remove Processwatcher

 

mv /var/DXi/processwatcher /var/DXi/processwatcher.orig           (You can also remove processwatcher ‘rm /var/DXi/processwatcher’)         

                                                                               

2.       Stop the ost service

 

/etc/rc.d/init.d/ost stop

 

3.       Wait until the ost service is stopped

 

You will see the following lines in the tsunami logs:

 

              INFO   - XX/XX/XX-XX:XX:XX - DXi_script DataPath(0) [qlog] - ost stopping

 

              INFO   - XX/XX/XX-XX:XX:XX - DXi_script DataPath(0) [qlog] - ost stopped

                                               

4.       Check if there are any open connections

 

Once OST stops you will need to verify there are no connections open via OST to the LSU’s. If there are connections open and you restart OST that can be a trigger for OST to go into LSU recovery-mode. OST reads its config-file to find that the storage-server connection count open to the LSU and if it is set (non-zero) This will cause the recovery-mode..

 

             The DXi file /data/hurricane/conf/OSTStorageSrvrLsu.conf will contain an entry like this under each storage-server:

 

               <StorageServer>

                   ...

                   <srv_nconnect>2</srv_nconnect>

 

               In this example above "nconnect" was 2  when ostd had been shutdown. This will trigger ostd do go into LSU recovery-mode when restarted.

 

               Edit the ‘srv_nconnect’ to be 0 before restarting OST.

 

5.       Restart the ost service

 

/etc/rc.d/init.d/ost start              

                               

6.       Verify the file NS_OSTD

 

Verify /snfs/common/data/NS_OSTD exists for a few minutes after startup.

 

7.       Verify ost is started before starting other services such as replicationd

 

In the tsunami log you will see ost startedand verify “INFO   - XX/XX/11-XX:XX:XX - DXi_script DataPath(0) [qlog] - ost started”.

 

8.       Replace processwatcher

 

mv /var/DXi/processwatcher.orig /var/DXi/processwatcher       (If you removed processwatcher you will need to ‘touch  /var/DXi/processwatcher’)

 

 

 

 

 

 

 

On The Media Server

 

EW: What changes, if any, need to be made in the files listed below?

Files:

 

                JE:  [log_file]

                        LOG_LEVEL=ERROR

 

  

 

                                ; NONE - disables logging for the logger.

                                ; TRACE - enables tracing to error messages.

                                ; DEBUG - enables debug to error messages.

                                ; INFO - enables information to error messages.

                                ; WARN - enables warning and error messages.

                                ; ERROR - enables only error messages.

 

 

 Note:  After you change a log level, the system must be rebooted.  (MZ: reboot not necessary, the changes go into effect automatically when new OST jobs are started)


What's Next?

What is Optimized Duplication? >

 

 

    



This page was generated by the BrainKeeper Enterprise Wiki, © 2018