Typical Issues that Lead to Log Analysis within a Support Package

Overview

This section describes several example problem scenarios and how to use log analysis to help resolve the issue.

 


Example 1

Problem 

Customer reports a problem, or alert(s) that have appeared in the appliance GUI or that the customer has received in an e-mail alert.

 

In this case, after adding an ESX server to the appliance, the customer discovered that virtual machine discovery was failing.

Troubleshooting Steps

To troubleshoot this scenario, first get a general idea of the time when the failure occurred, and review the message file(s) in the support bundle. In this case, you would look at logs from the pancontroller service, which is responsible for hypervisor and virtual machine discovery.

 

These messages are stored in the messages file. A look at that file shows the following:

 

 

  Larger Image

   

These entries show that discovery failed, and that the esx_vmregistered.pl program that runs the discovery actually died.

Resolution

Using the information from the log about esx_vmregistered.pl failing, the support agent talked with the customer and found that the customer was running ESX 3.5, an unsupported version of ESX. The customer was advised to upgrade ESX server to version 4 or higher.

 


Example 2

Problem 

While running a virtual machine backup, the customer received the following e-mail alert, saying that snapshot creation failed for one of their virtual machines:

 

==============

From: panceteraunite02.stk.pacific.edu [mailto:pancetera@pacific.edu]
Sent: Friday, June 17, 2011 8:22 PM
To: autosupport-report@pancetera.com
Subject: Alerts: 1 new alert

 

ESX Server 'vcenter.stk.pacific.edu',  Virtual Machine 'pacadam1.stk.pacific.edu': Snapshot creation failed for virtual machine 'pacadam1.stk.pacific.edu' on server 'vcenter.stk.pacific.edu'.

============== 

Troubleshooting Steps

Using the approximate time that the alert message was sent, and the name of the virtual machine, the support agent searched the vm_proxy_fs log from the support bundle and found the following:

 

===========

2011-06-17 19:20:02.517737: Creating snapshot Pancetera: 4d4ef7a8570911dfaa4b005056bb073d on pacadam1.stk.pacific.edu vcenter.stk.pacific.edu

Fault:
SOAP Fault:
-----------
Fault string: Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.
Fault detail: FilesystemQuiesceFault

Error creating snapshot for virtual machine pacadam1.stk.pacific.edu
2011-06-17 19:20:07.276241: ERROR [thread:2987391888, where:/usr/local/pancetera/Perl/Pancetera/VISnapshot.pm:40, error:98] Cannot create snapshot Pancetera: 4d4ef7a8570911dfaa4b005056bb073d on pacadam1.stk.pacific.edu: Address already in use

===========

Using this informaiton to search the VMware site brought up the following VMware KB article:

http://kb.vmware.com/kb/1018194

 

As shown in the article, snapshot creation failed becuase the virtual machine had VSS enabled and the rate of  I/O changes on the VM was too high, so that the quiescing operation could not flush all data changes to disk in a timely manner while further I/O was being processed. This was causing a timeout.

Resolution

As suggested in the VMware KB article, the customer disabled VSS, which solved the problem.

 


Example 3

Problem 

A virtual machine backup failed, and an e-mail alert was sent to the cutomer

 

======

ESX Server 'esx_server', Virtual Machine 'vm_name': Cannot create snapshot 'Pancetera: bf2782ea339211dfaaa7005056bb6cd5' for VM 'vm_name': SystemError:A general system error occurred: Protocol error from VMX.: Input/output error

======  

Resolution

This error also appeared in vm_proxy_fs log. It can occur if VMware VSS is enabled for the VM, and in addition, a third-party VSS tool is also running inside the VM. In this case, verify that VSS is enabled in the appliance GUI, and that a third party VSS is also enabled inside the running VM. To resolve the issue, tell the customer to disable VSS from the appliance GUI, or disable the VSS running inside the VM.

 

See following VMware KB articles for more information:

http://kb.vmware.com/kb/1007346

http://kb.vmware.com/kb/1009558

 


Example 4

Problem

While file-level recovery was being performed, and the appliance’s /recover/files directory ws being browsed with Windows Explorer, the following error message displayed:

 

  

  Larger Image 

 

In this case, the address shown in the first line of the error message was the specific value of the general specification \\appliance\recover\files\...\...\vmname\.

Resolution 

None yet, but troubleshooting work is still in progress. A review of recover_fs_files shows the following errors:

 

##

2011-09-26 14:53:20.174568: ERROR: recovery_vol_access: Could not open /recover/files/pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local-flat.vmdk or /recover/files/pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local-rdm.vmdk

 

2011-09-26 14:53:20.174584: ERROR [thread:3045288848, where:recovery_api.c:1487, error:2] Could not mount file system view for: /pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local.volume

recovery_fs_files[982]: ERROR [thread:3045288848, where:recovery_api.c:1487, error:2] Could not mount file system view for: /pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local.volume

 

2011-09-26 14:53:20.174711: mounting /pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local.volume

2011-09-26 14:53:20.175265: ERROR [thread:3066547088, where:recovery_api.c:485, error:16] Expect 69816176640 bytes in /scratch/pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local-flat.vmdk but found 14056538112 bytes. Recovery is probably in progress, please try again later.

recovery_fs_files[982]: ERROR [thread:3066547088, where:recovery_api.c:485, error:16] Expect 69816176640 bytes in /scratch/pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local-flat.vmdk but found 14056538112 bytes. Recovery is probably in progress, please try again later.

 

2011-09-26 14:53:20.175661: ERROR [thread:3066547088, where:recovery_api.c:1001, error:2] /pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local-flat.vmdk: No such file or directory

recovery_fs_files[982]: ERROR [thread:3066547088, where:recovery_api.c:1001, error:2] /pancetera-sync/2011-05/2011-05-10-1323/1.1.5.163/hebmail.hebelercorp.local/hebmail.hebelercorp.local-flat.vmdk: No such file or directory

##

 

 

 

Notes

Looking good.

Note by Keith Hatton on 10/19/2011 09:22 AM


This page was generated by the BrainKeeper Enterprise Wiki, © 2018