vmPRO SmartMotion Backup Hangs with NDX as Target Storage |
When a vmPRO SmartMotion backup is performed to a Quantum NDX appliance, and the virtual disk has a large gap of zero data, the write command hangs long enough to cause a CIFS timeout. This causes the backup to hang indefinitely.
SR Information: N/A (this was found in test)
Product / Software Version: All vmPRO software versions with a Quantum NDX appliance configured as target storage.
Problem Description: When the vmPRO virtual disk has a large gap of zero data, a virtual machine (vm) backup to an NDX appliance can hang indefinitely.
|
The information in the messages log below shows that an NDX system (10.30.242.27) is not responsive.
messages log: ============= Mar 7 15:57:51 localhost kernel: CIFS VFS: Server 10.30.242.27 has not responded in 300 seconds. Reconnecting... Mar 7 16:02:52 localhost kernel: CIFS VFS: Server 10.30.242.27 has not responded in 300 seconds. Reconnecting... Mar 7 16:07:54 localhost kernel: CIFS VFS: Server 10.30.242.27 has not responded in 300 seconds. Reconnecting... Mar 7 16:12:55 localhost kernel: CIFS VFS: Server 10.30.242.27 has not responded in 300 seconds. Reconnecting... Mar 7 16:17:56 localhost kernel: CIFS VFS: Server 10.30.242.27 has not responded in 300 seconds. Reconnecting... Mar 7 16:22:57 localhost kernel: CIFS VFS: Server 10.30.242.27 has not responded in 300 seconds. Reconnecting...
When a large chunk of zeros is encountered with smartcp, instead of writing them, the system advances the offset and occasionally will write a few zeros to the output file to prevent a CIFS timeout.
When the system eventually finds data that needs to be written, it will write to the output file at the current offset, which usually completes quickly. On an NDX appliance, writing after a large seek will cause a long delay since it appears that the NDX needs to write all of the zeros up to the seek location instead of creating a sparse file and skipping to the appropriate location. The occasional write of zeros that the system does in smartcp does not appear to reduce this delay that happens on the first non-zero write.
This issue was discovered on a vm that had an extremely large (45 GB) hole in the middle. When the system got to the non-zero data, the write command would hang long enough to cause a CIFS timeout.
Refer to PTR 5357 in Bugzilla for additional details.
Changing the “nas_storage.sparse_enabled” registry key to 0 (zero) will fix this issue and allow the vm to be backed up to an NDX appliance. It results in smartcp writing all of the zeros it encounters instead of skipping them.
To fix the issue:
Example output: nas_storage.sparse_enabled = 1
Example output: Registry key 'nas_storage.sparse_enabled' set to '0'
Example output: nas_storage.sparse_enabled = 0
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |