VMPRO: Virtual Machine backup stuck in "Queued" state |
SR Information: 1611512
Product / Software Version: VmPro 3.1.1 - vCenter 5.1
Problem Description: Virtual machine backup will stick in "queued" state
Related PTRs:
Bug 5678 - Customer URS, SR 1589842: Finished vmPRO jobs displaying as running
Overview
The scenario of the issue:
Notes:
Symptom (how to identify the problem)
--
-- Data for Name: alert; Type: TABLE DATA; Schema: public; Owner: root
--
COPY alert (entry_id, gen_id, alert_id, object_id, alert_time, hv_host, vhost, severity, message, cleared_by, cleared_time, node_uuid, managed_alert, posted) FROM stdin;
2597 351141 \N 7d38f1c5-3bd4-4b17-b399-bbab2c4833e5 registry_update {"values": {"feature.smartview.enabled": true, "feature.smartmotion.enabled": true}, "name": "feature"} e5298360e0bb11e29c3e0da2d091e093 0 1378095016.862613 -1 -1
2597 103 -1 timed out stuck in queue \N \N \N \N
2598 364769 \N 93e09b44-f398-4095-8c2e-8cd215e4b67c registry_update {"values": {"feature.smartview.enabled": true, "feature.smartmotion.enabled": true}, "name": "feature"} f2cedc4ae0bb11e283ce21512d2369ee 0 1378883431.632072 -1 -1
2598 103 -1 timed out stuck in queue \N \N \N \N
2599 353517 \N 45158f2d-0aa2-4d22-8a2a-e023d8ef3a52 registry_update {"values": {"feature.smartview.enabled": true, "feature.smartmotion.enabled": true}, "name": "feature"} e5298360e0bb11e29c3e0da2d091e093 0 1378238977.124738 -1 -1
2621 350138 \N 47895eba-48a1-46fd-94f0-0e4534e20086 tomato_sync_folder {"parent_command": "f2ebc5fa-665c-4961-9e43-9d0cecaaf890", "queued_time": 1378053902.792007, "smtask": {"enabled_w": false, "copy_time_m": "use_current", "enabled_y": false, "copy_tim 2621 e_y": "use_current", "enabled_d": true, "enabled_m": false}, "folder_uuid": "d264b444-5b7e-4346-9d41-e4ecfc78f159", "policy": {"schedule_week": "1,0,0,0,0,0,0", "retain_years": 0, "uuid": "2b8837c4-1b86-46e1-a154-0d24e24794ab", "copy_mode": "backup", "schedule": 2621 "1145", "config_backup_enabled": 1, "enabled_month": 0, "retain_weeks": 4, "enabled": 1, "enabled_week": 1, "is_default": 0, "enabled_year": 0, "copy_time_month": "use_current", "schedule_year": "", "retain_months": 0, "email_enabled": 0, "target_id": "dfa74c5a-0 2621 cdb-4081-8d74-b665bd52f8e6", "retain_days": 10, "copy_time_year": "use_current", "schedule_month": "", "name": "IT-Development"}, "policy_error": null, "group_uuid": "73f2182e-38f3-4b5c-b996-a41464edaf4c"} f2cedc4ae0bb11e283ce21512d2369ee 0 1378 2621 053904.0380311 -1 -1 103 -1 timed out stuck in queue \N \N \N 73f2182e-38f3-4b5c-b996-a41464edaf4c
2911 350859 \N 52901510-edda-423d-b630-cb8386f1df65 node_update_vm {"uuid": "50239a64-64e4-458c-7006-6faccc43606c"} f2cedc4ae0bb11e283ce21512d2369ee 0 1378084730.7184 -1 -1 103 -1 timed out stuck in queue \N
Notes:
So on the example above we see that all the backups in the policy "IT-Development" started and finished on Sep 17.
Looking at the vmware.log for the Virtual machine ICWSPT1, it was found that the last activity was reported on Aug 30. We see that a snapshot consolidation occurred at that time, but the latest events (message to toolbox-dnd time out) indicate a possible issue with the vmtools:
# tail vmware.log
2013-08-30T16:56:01.138Z| vcpu-0| I120: SnapshotVMXConsolidateOnlineCB: Destroying thread 10
2013-08-30T16:56:01.138Z| vcpu-0| I120: Turning off snapshot info cache.
2013-08-30T16:56:01.139Z| vcpu-0| I120: Turning off snapshot disk cache.
2013-08-30T16:56:01.139Z| vcpu-0| I120: SnapshotVMXConsolidateOnlineCB: Done with consolidate
2013-08-30T16:56:01.152Z| vcpu-2| I120: HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/51b4dc81-6739d73e-dda5-002219238016/ICWSPT1/ICWSPT1.vmdk'
2013-08-30T16:56:01.152Z| vcpu-2| I120: DDB: "longContentID" = "fc61879d10187fbfba8786f1dfed51bb" (was "4ffb9f780ef4dd6e08bbc103719948ce")
2013-08-30T16:56:01.176Z| vcpu-2| I120: DISKLIB-CHAIN : DiskChainUpdateContentID: old=0x719948ce, new=0xdfed51bb (fc61879d10187fbfba8786f1dfed51bb)
2013-08-30T16:56:53.180Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2013-08-30T16:56:56.710Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2013-08-30T16:57:01.146Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
If vmware tools under the Virtual Machine you want to backup isn't running, ESX/vCenter won't be able to take snapshots.
5.1. The backup environment is composed of a group of 5 vmPRO servers:
Group members
=============
Hostname Role Last Contact Time
-------------------- ---------- ----------------------
icuvmprop1 master 05 Sep 19:22:14
icuvmprop3 node 03 Sep 09:35:56
icuvmprop4 node 05 Sep 19:22:03
icuvmprop2 node 05 Sep 19:22:11
icuvmprop5 node 30 Aug 19:33:28
-------------------- ---------- ----------------------
5.2. The Virtual Machine ICWSPT1 is registered under the node icuvmprop5.
Virtual Machines
================
Name Server Operating System Discover Time Exported CBT Enabled Quiesce Enabled Node
------------------------------------ ---------------- ------------------------------ --------------- --------------------- ------------ ---------------- ----------------
tcprod-app1 vCenter Microsoft Windows Server 2008 05 Sep 19:06:52 yes yes no icuvmprop4
tcprod-etl1 vCenter Microsoft Windows Server 2008 05 Sep 19:06:52 yes yes no icuvmprop4
Pegasus vCenter Microsoft Windows Server 2008 05 Sep 19:21:14 yes yes no icuvmprop2
Metis vCenter Microsoft Windows Server 2008 05 Sep 19:06:51 yes yes no icuvmprop1
ICATRUCARED1 vCenter Microsoft Windows Server 2008 05 Sep 19:06:52 yes yes no icuvmprop4
ICWSPT1 vCenter Microsoft Windows Server 2008 05 Sep 19:06:52 yes yes no icuvmprop5
ICUPATCHP1 vCenter Microsoft Windows Server 2008 05 Sep 19:17:42 yes yes no icuvmprop2
RDS1 vCenter Microsoft Windows Server 2008 05 Sep 19:12:49 yes yes no icuvmprop4
5.3. You'll also find in the master log that for a long time the master has been unable to communicate to the slave node icuvmprop5:
controller.1:2013-09-13 10:11:17: Clearing alert: No contact from a node icuvmprop5 in the last over 1 week.
controller.1:2013-09-13 10:12:13: controller/WARNING:: AlertLib.py:112 Posting alert: No contact from a node icuvmprop5 in the last over 1 week.
controller.1:2013-09-13 10:37:17: No contact from a node icuvmprop5 in the last over 1 week.
controller.2:2013-09-06 19:37:11: controller/WARNING:: AlertLib.py:112 Posting alert: No contact from a node icuvmprop5 in the last 1 week.
controller.2:2013-09-06 20:17:39: No contact from a node icuvmprop5 in the last 1 week.
controller.2:2013-09-07 19:36:06: controller/WARNING:: AlertLib.py:112 Posting alert: No contact from a node icuvmprop5 in the last over 1 week.
5.4. And looking at the vmPro master server web GUI under VMs > Resources you'll see that the slave icuvmprop5 is registered but invalid:
In this case, customer checked his Virtual Machine inventory and confirmed that the node icuvmprop5 no longer exists.
Resolution (fix or workaround)
Delete the node under VMs > Resources Click Configure and in the Configure Node List window, you can delete the node.
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |