Release Notes

 

 

 

Release

3.6.8 Upgrade

Supported Product

Lattus

Date

April 2017

About Lattus

Quantum Lattus™ Object Storage is disk-based storage that meets the extreme scalability, durability, and access requirements of large-scale Big Data archives. Lattus object storage offers several different access methods to put data in and get data out of the Lattus storage, including StorNext access, and HTTP/REST access for customers who have ported their applications to use the Lattus REST APIs.

The Lattus Object Storage product is a fully modular system that can be packaged in many ways. The core of the product is the Lattus object storage subsystem, which consists of three Controller Nodes, six or twenty Storage Nodes, and Rack, System and Interconnect switches as needed.

This object storage subsystem can be used alone or combined with different management components.

The Lattus-M product uses a StorNext Appliance or customer-supplied MDC for management.

The Lattus-D product is for customers interested in static-file archive for data protection purposes. Lattus-D uses no management component and communicates with external applications via native protocols. The Lattus-D product is initially qualified for use with the CommVault Simpana 10 Cloud Connector technology when using the S3 protocol to communicate.

Purpose of This Upgrade

This document announces the release of the Lattus 3.6.8 upgrade. This upgrade includes new features described below.

What's New in This Release

This release adds support for Lattus C10 v2 Controller Nodes and Lattus S50 Storage Nodes.

  • The C10 v2 Controller Node is based around a Dell R630 server.
  • The S50 Storage Node includes twelve 10 TB drives, providing total storage capacity of 120 TB.

    Note: In the Lattus CMC, S50 storage nodes are named "S50" and not "NSS_1121_120." (Lattus S20 Model 2 nodes are named "NSS_1121_48," and S30 nodes are named "NSS_1121_72.")

Compatibility

This Quantum Lattus release is compatible with the following devices:

  • Lattus C5, C10 or C10 v2 Controller Nodes
  • Lattus S10, S20, S20 Model 2, S30 and S50 Storage Nodes
  • StorNext Metadata Appliances
    • Quantum StorNext M441D Metadata Appliance with 10 GbE NIC
    • Quantum StorNext M445D SSD Metadata Appliance with 10 GbE NIC
    • Quantum StorNext M662 Metadata Appliance
    • Quantum StorNext M662XL Metadata Appliance
    • Quantum StorNext M665 SSD Metadata Appliance
    • Quantum Xcellis Workflow Storage Products
    • Artico Archive Gateway Intelligent NAS Storage Appliance

This release has been certified on the following switches:

  • Lattus S60 and S55 Rack Switches
  • Lattus System Switch
  • Lattus Interconnect Switch

Lattus-D Compatibility

The Lattus-D product is currently qualified for use with the CommVault Simpana 10 Cloud Connector technology when using the S3 protocol to communicate.

Lattus/StorNext Compatibility

The following table shows compatibility between Lattus and StorNext releases.

Lattus Release Compatible StorNext Release
Lattus 3.3.1 StorNext 4.7.x and StorNext 5
Lattus 3.4.x StorNext 4.7.x and StorNext 5
Lattus 3.5.x and 3.6.0 StorNext 5.2 and 5.3.x
Lattus 3.6.6 StorNext 5.2 and 5.3.x
Lattus 3.6.8 StorNext 5.2, 5.3.x, and 5.4.x

Virtual Environment Compatibility

Quantum recommends using the Lattus software in a virtualized environment only for training and demonstration purposes.

Running virtualized for performance testing is not supported because the use of SSD is required for the proper functioning of the software.

Supported Lattus Upgrades

Direct upgrades to Lattus 3.6.8 are permitted only from Lattus 3.6.0 or 3.6.6.

The following table shows other supported upgrade paths that require incremental upgrades between the initial (starting) release and Lattus 3.6.8:

Initial Lattus Version

Upgrade Path to Lattus 3.6.8

3.6.1, 3.6.3 or 3.6.4 3.6.1, 3.6.3 or 3.6.4 to 3.6.6 and then to 3.6.8

3.4.3 or 3.5.x

3.4.3 or 3.5.x to 3.6.0 and then to 3.6.8

Lattus Browser Support

Lattus software supports the following set of browsers:

  • Firefox versions 4 and later (Quantum recommends using the latest released version)
  • Internet Explorer version 8 through 10 (Quantum recommends IE10)
  • Chrome version 18 and later (Quantum recommends using the latest released version)
  • Safari version 5.1

TLS Support

Certain browsers, as noted below, do not support TLS and therefore do not support all features or https available in this Lattus software release

  • Chrome versions 18-21 do NOT support TLS 1.1 and higher
  • Chrome versions 22-29 support and run TLS 1.1 and higher
  • Chrome versions 30 and higher support and run TLS 1.1 and higher
  • Internet Explorer version 8 and higher supports TLS 1.1 and higher (functionality is disabled by default)
  • Firefox versions 19 and higher support TLS 1.1 or higher
  • Safari does NOT support TLS 1.1 or higher

Note: FlashPlayer 11.2 or later is required with the Lattus graphical user interface

Lattus Multi-Geo Parameters

If you have a Lattus Multi-Geo configuration, before using Lattus you must add certain parameters to the [config] section of all /opt/qbase3/cfg/dss/clientdaemons/*.cfg files in order to avoid encountering various issues. After making these changes, you must restart client daemons for these configuration changes to take effect.

For details and instructions about adding these parameters, refer to the Lattus Installation Guide.

Resolved Issues

The section describes the issues that were fixed in this Lattus release.

Various Issues

Issue Description
AMIF-6124: Lattus Phone Home reports with the 3.6.5 patch didn’t include the version number. The version number is now included in Lattus Phone Home reports.
AMIF-6206: Failure of applicationserver to start after a reboot. The applicationserver failed to start after rebooting the system.

Known Issues

This section describes known issues pertaining to the Lattus system. Where applicable, workarounds are provided.

Installation Issues

Issue Description Workaround

AMIF-77: After the initial configuration step on the Management Node, an unclear error is produced because the admin password has changed.

The following error appears:

"faultcode:AuthenticationError, faultstring:'error',

faultdetail:"Authentication Failed"

N/A

AMIF-992: There is no check for the queue initialization file’s extension.

Uploading a file with an extension that differs from .csv leads to failed queue file upload.

N/A

AMIF-1810:- A deprecation warning message appears when initializing a Q-Shell environment

This message appears because an older version of the pycrypto package (2.0.1) is used, and newer versions are available.

The behavior of these newer versions is currently being tested.

N/A
AMIF-1952: Warnings generated during installation on a Dell R620.

When booting the machine with USB (and later on the Dell R620) you might receive several debug/ warning/error output messages during drive detection, such as the following: HDIO_GET_IDENTITY failed for ‘dev/sdc’

We use hdparm to determine whether a drive is HDD or SSD, and hdparm cannot read the information out of the SAS drives. Data must be entered manually.

N/A

AMIF-1858: When adding a node to an existing environment, the default login and password are assigned

When you add a node to an environment that is already set up, the node is initialized with the default node login/password combination.

After adding a node, execute the procedure to change the password for the root account.

AMIF-2020: Installing the controller node fails with error: Invalid partition table:recursive partition on /dev/sdd

When the USB drive is left inserted into the machine after OS installation, this error can occur during the initialization.

This error occurs with specific types of USB drives.

If you encounter this issue, clean up the machine and restart the installation.

Remove the USB drive after rebooting the machine, prior to initializing.

AMIF-2593: Installation fails for node that was removed from the queue

This issue occurs after doing the following:

  • Build up a node queue with more than two servers and a few storage nodes.
  • Select all nodes and initialize the queue.
  • Remove the second node from the queue, while the first node is being initialized.

In this case, the running “Initializing X nodes” still tries to process the removed node, resulting in an error.

Remove the node that failed to install in order to avoid having the failure.

AMIF-3082: Lots of DEBUG.osis.utils messages on the console during the management node Installation

After you successfullly install the management node and reboot for the first time, you will receive the following debug messages:

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/ipaddress.py

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/customer.py

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/clouduserrole.py

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/application.py

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/job.py

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/networkzonerule.py

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/hardwarelayout.py

DEBUG:osis.utils:Loading /opt/qbase3/libexec/osis/policy.py

These debug osis messages are benign and can safely be ignored.

N/A

AMIF-3084: rpcbind error messages appear after rebooting an installed management controller

After installing and rebooting the management controller, the boot log might contain error messages such as the following:

rpcbind: Cannot open ‘<file_name>’ file for reading, errno 2 (No such file or directory)

These errors are benign and have no further impact on the further installation of your environment.

N/A

AMIF-3732: 1MB SSD storage is missing on another non-CMC controller while creating object MetaStore

The size of the MetaStore cannot be greater than the remaining space on the SSDs. If it is greater, that SSD will not be available to add to the MetaStore).

N/A

AMIF-4246: Cannot clean up node with status “Installing”

When a node has the status “Installing”, but the installation of the node has failed, it is not possible to clean up the node. For example, this situation can occur in an installation with incorrect routes.

Set the status of the node to “Failed,” and then remove the node from the CMC.

AMIF-4302: Installer shows wrong hardware type when using generic hardware

When installing a management node or a remote node using USB on generic hardware, the installer shows “CTRLGeneric” (generic controller node) as the hardware type. The installer does not detect whether you are installing a storage node or controller node. During the initialization, the hardware type is updated with the correct type.

This information is visible in the installation logs in the CMC.

N/A
AMIF-4509: Unable to create cloudApi connection after management node initialization

It is possible that, during an unattended installation, creating a cloud API connection could fail with the following error:

Exception: <class 'cloud_api_client.Exceptions.CloudApiException'> <ProtocolError for admin:admin@127.0.0.1:80/appserver/xmlrpc/: 502 2014-03-10 18:07:04,786

If this occurs, check whether the application server is still available (process and in application server log!) and restart it if necessary.

AMIF-4591: Possible to use OS disks to extend MetaStores and Cache Clusters When the OS is installed on solid state disks, these disks are also included in the list of disks when you extend a MetaStore or Cache Cluster. These disks should not be visible in these wizards. N/A

AMIF-4660: Cleanup on a partially initialized node can break initialization of the next node

When you clean up a node whose initialization has failed between configuring the network and configuring DSS, the node keeps its static IP address. This might cause the next node you initialize to fail, because both nodes might have the same static IP address.

N/A
AMIF-4799: Incorrect BIOS settings on AS20/AS30 nodes prevent PXE installation

When you have incorrect BIOS settings, the system has PXE boot issues, and PXE will not be installed. The correct boot order is:

  1. Boot from LAN 1
  2. Disabled Boot from LAN 2
  3. Boot from disk
N/A
AMIF-5063: Support for native 4K disks Lattus 3.6.6 does not support native 4K disks. N/A
AMIF-5070: Installation of a new node fails if system clock is too skewed When you add a new node to a Lattus system and the system clock is too different, the initialization of the new node fails. N/A
AMIF-5096: Parallel initialization executed regardless of the warning message When you try to initialize a node while another node is being initialized, a warning message appears saying the node will not be initialized. However, despite this warning message, node initialization is started, which is not the desired behavior. N/A
AMIF-5392: Configuration of policies not correctly applied in backend When you create a default S3 policy with small files support and then create a new default S3 policy without small files support, the storage backend (DSS) still uses small files support. N/A
HARDWARE-27: Blocked installation due to usage of external hardware

In the following situations, node installation could fail:

  • When adding a storage node when an Altusen KVM is connected
  • Whan using an external monitor that is incompatible with the node's graphics card, connecting may block a PXE boot.

Possible errors: [drm:drm_edid_block_valid] *ERROR* Raw EDID or ERROR EDID checksum is invalid

To work around this issue, disconnect the external monitor.

PUT, GET, and REPAIR Operation Issues

Issue Description Workaround
AMIF-4392: Saving metadatastore object sometimes fails with “list index out of range” error

When a metadatastore object is saved, this operation could fail with the error “List index out of range”. The application server log is then flooded with the following line: DEBUG:osis.client.xmlrpc:PUT disk

This issue has been observed during a failover and when extending a MetaStore. The issue occurs when finding disk objects returns an empty list.

N/A
AMIF-5598: Possible HTTP 503 error when there are a lot of simultaneous S3 GET operations When your setup uses SSL for S3 and there are many simultaneous GET operations, you could receive an HTTP 503 error (service unavailable). This is caused by not using the connection reuse functionality. N/A

DSS-115: Namespace deletion removes data from the blockstores in a serial fashion

Deleting the data of a namespace removes the data from the blockstores in a serial fashion (that is, one blockstore at a time).

When a system has run full, it takes up to the n-th blockstore before the system can ingest data again.

If these deletes happen in parallel fashion, the space should be freed immediately on all blockstores, making the system usable immediately.

DSS-549: Unclear error messages when creating an already existing dir or a dir with a missing parent dir

Creating a directory that already exists results in a HTTP 405 “Method not allowed” message.

Creating a directory while its parent directory doesn’t exist produces a HTTP 409 “Conflict” message.

N/A

DSS-601: Pulling out a switch doesn’t immediately result in operations taking the bandwidth of the remaining switch

When one of the back-end switches fails, all existing put operations fail (until they time out). New puts that are started after the failure will succeed, but could also time out.

After all blockstores have been contacted, there will be no further connection attempts towards the old switch, and therefore no more reasons for failure.

Depending upon the size of the objects and superblocks, it can take up to 15 minutes before the full bandwidth capacity of the remaining NIC is used. After restarting the client daemons, it takes up to three minutes to achieve a steady state situation.

Adjust the timeout values on the client daemon. Restart the client daemons.

DSS-611: Upload button on Client daemon web-interface doesn’t work when authentication is enabled

This issue is caused because the upload button does a redirect to a location on which the logged-in user doesn’t have sufficient rights. There is no API available that provides these rights to the user.

N/A

DSS-747: When two back-end switches fail shortly one after one the other, put and get operations will hang

As stated in another known issue, it can take a while before enough connections are migrated from one network to another. If your failed switch is restored, it will take a while before all connections are migrated back and well balanced over both switches.

In cases where the second failure happens too quickly after the first, not enough connections are migrated back, and ongoing put/get operations will hang because there aren’t enough available connections.

Adjust the timeout values on the client daemon. Restart the client daemons.

DSS-869: Inconsistent output of REST directory entries listing

When requesting directory entries in REST using the various application types, the output is not consistent.

The output for listing directories in XML is fine, but not while using JSON.

N/A

DSS-1037: Unable to upload file when IF_NONE_MATCH HTTP Header is defined and user/password are also defined

Due to a bug in the libcurl, when trying to PUT a file while defining a user, a password and an IF_NONE_MATCH HTTP header, you receive a failure instead of success.

The curl initially writes (PUT) a file of 0 bytes. Then the curl makes a second PUT request on the same connection to write the actual contents of the file. Since the file already exists because of the first PUT, the second PUT fails.

N/A

DSS-1108: Client daemon returns an empty HTTP 200 response instead of an error if not enough blockstores are online

If two or more storage daemons are stopped in your environment and you try to read a file that was stored on those storage daemons, you should receive an error.

Instead, the AXR and S3 interface return a HTTP 200 response with a correct header, but without either empty content or an incompleteread() message.

N/A

DSS-1146: Rebalancing for small file support does not work

With the introduction of small file support, rebalancing has been disabled because the spreading of the full copy was not done in a homogenous way, and a rebalance could put too much strain on a single disk.

A workaround is described in KB article BSP038.

DSS-1436: Object names with a leading forward slash cannot be handled by the Lattus S3 interface

The Lattus S3 interface cannot handle objects whose names begin with a forward slash (“/”). Also, the authentication fails with a leading forward slash.

N/A

DSS-1443: Many GET operations makes the log file grow too fast

Many GET operations makes the BitSpread log file grow too fast. This typically occurs with low bandwidth setups. You see a lot of lines in the log file appear with the following content: wr_sb_duration >= 10.0s

N/A

DSS-1474: A failed PUT operation writes metadata to Arakoon anyway, but cannot be deleted afterwards

When a PUT operation fails, there is still metadata written to Arakoon. It is impossible to remove the loose metadata.

N/A

DSS-1484: Deleting an object via the AXR interface randomly fails with an Unauthorized error

Deleting an object using the AXR interface might randomly fail with the error: “HTTP 401 error: Unauthorized due to an exception in Arakoon while handling the request”.

N/A

DSS-1508: Files with white spaces in their name cannot be uploaded via AXR

Uploading a file fails when using the web form and the AXR interface, and when the file has white spaces in its name.

N/A

DSS-1518: Syncstore status “Full” is not immediately honored

When setting the status of a syncstore to “Full”, you can still upload files via the AXR interface for a while, but not via the Q-Shell interface. After a while it is no longer possible through the AXR interface.

When setting the status back to write mode, it again takes a couple of minutes before you can start uploading files via the AXR interface.

N/A

DSS-1658: Storage full message is misleading in a distributed setup

In a distributed setup, you might receive a message indicating that “Storage is full”. This message is by design, and in most situations means there are too many offline blockstores.

N/A
DSS-1752: PUT object with presigned URL fails if special headers are included Putting an object using a presigned URL and sending an ‘x-amz’ header, fails. The ‘x-amz’ headers includes ‘x-amz-acl’ and ‘x-amz-meta’. Without these ‘x-amz’ headers, the put with a presigned URL succeeds. To avoid encountering this issue, do not use the headers.
DSS-1834: AXR listing on /namespace shows S3 buckets When retrieving all objects via AXR on /namespace, all S3 buckets on which you have access rights are displayed when they should not be. N/A

DSS-1882: Outstanding objects and incomplete multipart uploads are not deleted

When you delete an AXR directory or S3 bucket, you do not delete possible outstanding objects or upload parts of an incomplete multipart upload. You will no longer see these ‘objects,’ but they remain on disk.

N/A

S3 REST Interface Issues

Issue Description Workaround
DSS-1262: Issuing simple GET / on the S3 service returns HTTP 500 error In this situation, the GET / command pointed in a browser at an S3 endpoint results in an HTTP 500 error. N/A
DSS-1263: PUT fails via S3cmd with extra ‘/’ in the path name

If you attempt something similar to the following, you will receive an HTTP 403: SignatureDoesNotMatch error:

./s3cmd put ../../testfile.txt s3://hbuck//testfile.txt

N/A
DSS-1271: Copying an object with an invalid option in the “x-amz-metadata-directive” fails with an error response that is inconsistent with Amazon AWS If you copy an object using the S3 interface and a value other than “COPY” or “REPLACE” is set in the “x-amz-metadata-directive” header, the error message sent in the HTTP response by the Amazon AWS S3 Interface is different from the message returned by the Lattus S3 interface. N/A
DSS-1281: Creating buckets with illegal characters returns a different response than Amazon

If you create (a) bucket(s) that contain any of the following characters, Lattus returns an HTTP 501 (not implemented) instead of the expected HTTP 400 (Bad Request):

#,{,},\,<,>,[,], \|,`,^,"

N/A
DSS-1286: Authentication error in client daemon logfile is sparse on telling why the authentication failed The output in S3 does not return the provided signature, but a CDATA field instead. N/A
DSS-1287: Service GET cannot handle Query Parameters like delim and prefix, which makes Webdrive fail When using a Webdrive client, listing the buckets will not work. You must specify a bucket. N/A
DSS-1288: GetObjectACL returns an HTTP 200 “OK” for a non-existing object GetObjectACL should return an HTTP 404 “Not Found” message. N/A
DSS-1299: Content-type headers returned in responses are different from Amazon This isssue occurs because Lattus does not yet store the “content-type” in its metadata. N/A
DSS-1300: S3 Interface reports an http 501 (not implemented) when a bad header is provided or you query param AND the bucket does not exist The interface should return an error “Parameter problem”, similar to Amazon. N/A
DSS-1310: S3 request with wrong domain name returns InvalidBucketName error When making a request with the wrong domain name, S3 returns an HTTP 400 error (bad request), while it should return an HTTP 403 (forbidden). N/A

DSS-1393: S3 GET requests with multiple byte ranges is not consistent with the Amazon S3 interface

When sending a GET request with the “Range” parameter having multiple byte ranges in the header, the whole object is returned instead of the requested range.

There is also a concatenation in the Lattus S3 interface.

N/A

DSS-1398: Force-creating a bucket on a syncstore with a special name generates unclear HTTP error responses

Force-creating a bucket in a syncstore with an syncstore ID that is not nicely 32 HEX characters, raises unclear error responses.

For example:

  • A syncstore with “aaaabbbbccccddddeeeeffff00001111” returns 404 Syncstore Not Found.
  • A syncstore with “spaghetti_arrabiata” returns 400 Bad Request, where it should be 400 Bad Syncstore ID
N/A

DSS-1498: Cancelled multipart upload has difference between behavior of BitSpread and Amazon

After cancelling a multipart upload and then trying to upload other parts of the multipart, there is a difference in the behavior.

Amazon: socket timeout

BitSpread: S3ResponseError: 404 Not Found.

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchUpload</Code>
<Message>The specified multipart upload does not exist.
The upload ID might be invalid, or the multipart upload might have been aborted or completed.</Message>
<RequestId></RequestId>
<Resource>testfile7</Resource></Error>
N/A

DSS-1521: Sorting order for multipart uploads in progress are not based on creation time

When listing multipart upload in progress, the sorting order is by object name. If the object name has multiple multipart uploads associated with it, they should be ordered by creation time, but they aren’t.

N/A

DSS-1543: Socket timeout when updating a multipart object

When you update an object or multipart object with a regular PUT request without update permissions, you can receive a socket timeout instead of an “access denied” response.

N/A

DSS-1561: S3 requests fail when using unsupported time zones

When using unsupported time zones (for example Iran Standard Time - IRST), an S3 request fails with an HTTP 500 Internal Server Error.

N/A

DSS-1566: Uncompleted multipart objects are not deleted when their bucket is deleted

When deleting a bucket that contains incomplete multipart uploads, these incomplete parts are not removed from the blockstore directories, but they should be removed as well.

N/A

DSS-1568: “GET /?versioning” Response cannot be parsed due to missing new line character

The response of the request “GET /?versioning” cannot be parsed due to a missing new line character after the <?xml version ="1.0" encoding="UTF-8"?> element.

N/A

DSS-1569: Different responses between AWS S3 and Lattus S3 for some time zone formats

In Lattus, some x-amz-date time zone formats might return different responses (internal server error, forbidden) compared to AWS S3.

N/A
DSS-1819: An S3 request without a date header returns an HTTP 500 Error When you perform an S3 request to Lattus S3, you receive an HTTP 500 Internal Server error. The correct behavior should be HTTP 403 Forbidden to be compliant with Amazon. N/A
DSS-1822: S3 request with Expires parameter is not implemented An S3 request that contains the “Expires” parameter returns an HTTP 501 Not Implemented response because it is not implemented in Lattus. N/A
DSS-1827: S3 request with TE parameter is not implemented An S3 request that contains the “TE” (Transfer Encoding) parameter returns an HTTP 501 Not Implemented response because it is not implemented in Lattus. N/A
DSS-1833: Uploading multipart object succeeds if size is greater than multipart_ object_max_size If you upload a multipart object whose size is greater than the defined multipart_object_max_size, the upload succeeds, where it shouldn’t. N/A
DSS-1915: Latest versions of Cyberduck cannot create buckets in Lattus Some versions of Cyberduck report interoperability errors when connecting to an S3 client daemon with SSL enabled. An identified issue is that it is not possible to create buckets when using Cyberduck 4.6.3. N/A
DSS-1916: Multipart upload with latest versions of Cyberduck raises error

When executing a multipart upload using Cyberduck 4.6, an error is generated: Upload failed due to a mismatch between MD5 hash {0} of uploaded data and ETag {1}

This error message is returned by the server even though the upload succeeded.

N/A
DSS-2267: A pre-signed HEAD request fails with HTTP Error 405 A pre-signed HEAD request to a bucket fails with HTTP Error 405 (Method not allowed). In general, pre-signed DELETE and POST requests on objects and all pre-signed requests on buckets are not supported. N/A

CMC Issues

Issue Description Workaround
AMIF-920: Partitions with usage space less than 1 GB showed as having 0 GB used space. On the the detail page of a disk (on the Monitoring tab), the partitions that use less than 1 GB show 0 GB as their actual used space. N/A
AMIF-2033: S10 node with SSD inside will identify wrong diskvnumbers on the disk decommissioned disks printouts. When an S10 storage node has an SSD mounted inside and a disk needs to be decommissioned from that node, the decommissioned disk page indicates the wrong location for the disk. Use the disk serial number mentioned on the page to identify the disk.
AMIF-3209: confusing message in job log

In some of the job logs in the CMC, you might spot messages like the following at the end of the job, even though the job was marked as done with no errors:

Failed to parse params {<job_details>}, Error: malformed string

This is actually a coding warning instead of an error, and does not influence the job itself or your overall performance. This will be fixed in a future Lattus release.

N/A
AMIF-3413: Events and Jobs in the CMC are not using the same timezone set on the machine The timezone definitions used are not recent and might be out of sync if your country recently changed time zones. N/A
AMIF-3855: Enabling S3 via the CMC adds an empty domain in the configuration files of the client daemons When you enable S3 on any client daemon with the “Enable S3 bucket operations” option selected but without defining an S3 domain name, the domain parameter in the configuration files of the client daemons remains empty. As a result, no bucket operations can be performed. Enable S3 again via the CMC, but don’t leave the domain field empty.
AMIF-4309: Temporary unavailability of the CMC Due to an old issue with mod-python, Apache is restarted on a weekly basis. During the restart the CMC is unavailable. N/A
AMIF-4636: Exception displayed after deleting unmanaged disk from the disk detail page

When deleting a disk with status “Unmanaged” from the disk’s detail page, an exception is generated even though the operation succeeds.

Exception: faultCode:Fault faultString:'error' faultDetail:'<Fault 8002: 'disk with guid 5b4a0b42-b8ab-4889-b675-692803c65995 not found.'>'

N/A
AMIF-4638: Storage Pool Used Capacity includes decommissioned storage nodes In the CMC on the dashboard, the graphic of “Storage Pool Used Capacity” is displayed. The total available capacity includes decommissioned storage nodes, but these shouldn’t be included. N/A
AMIF-4644: Scale of swap graphic under controller or storage nodes is 1024 off The scale of the swap memory under controller or storage nodes is expressed in MB instead of GB. N/A
AMIF-4713: No location image of decommissioned disks after upgrade When you have decommissioned disks and you perform an upgrade to Lattus 3.5.x or later, there are no image locations of the decommissioned disks due to the new format of the disk’s bus location. There will be image locations for disks that are decommissioned after the upgrade. N/A
AMIF-4759: CMC shows difference between logical size and physical size of mdX disks For RAID devices, the logical size shows the usable size, and the physical size shows the physical hardware sizes to obtain the logical size. It might occur that both sizes are equal, even though the physical size is larger than the logical size. N/A
AMIF-4770: Export to PDF of decommissioned disks don’t work When you use Flash Player 13.0.0.214 or higher, you cannot export the details of decommissioned disk to a PDF file. N/A
AMIF-5006: MetaStore FULL for only one MetaStore node instead of all nodes When a MetaStore exceeds the threshold for number of keys, the MetaStore is marked as “FULL”. The CMC marks only one node of the MetaStore as FULL, but all three nodes of the MetaStore should have the status “FULL”. This condition has no functional impact. N/A
AMIF-5119: Disk location image not visible when using a tunnel to CMC When you use a SSH tunnel to access the CMC, the disk location image is not visible in the details of a decommissioned disk. N/A
AMIF-5168: Misleading warning sign icon on disks without blockstores in a storage node The disks on which the operating system is installed do not contain a blockstore. In the CMC, the disks without a blockstore are indicated with a warning sign icon, which might be confusing for the disks with the operating system. N/A
AMIF-5271: Erroneous validation when adding new user When you add a new user in the CMC, you might get a validation error “Invalid email specified, Check email syntax” when entering the login field and moving to the next field, even though the login field does not require an email address. This occurs when you use a login name with a dash (“-”) or period (”.”). N/A
AMIF-5295: /29 network indicated as invalid in CMC When you add a new network in the CMC and define a 255.255.255.248 netmask (/29), the wizard indicates that an invalid netmask is selected. N/A
AMIF-5302: Disks window in CMC might show multiple instances of a drive name When you decommission a disk, that disk is no longer presented to the OS after restarting the server. However, due to drives reordering, it is possible that a healthy disk gets the name of the decommissioned disk. As a result, there are now two disks with the same disk name in the CMC disks overview. N/A
AMIF-5875: User can delete administrative group In the CMC, it is possible to remove the cloud user group “Adminstrators,” which results in the failure of every cloud API call. N/A
DSS-1296: Swap usage goes to 100% when doing multiple uploads of large objects When performing multiple uploads of large objects, the event “swap usage is over 50%” is triggered. In fact, the swap usage is actually 100%. Restart the daemons.

Monitoring Issues

Issue Description Workaround
AMIF-3259: Certificate expiration dates for monitoring are converted from the local time zone and not from GMT The certificate expiration time is converted off the local time of the CMC machine, instead basing it on GMT. Because of this, there might be a timespan of a few hours where monitoring/eventing will differ from reality. N/A
AMIF-3534: Monitoring agent “task python:PID blocked for more than 120 seconds” error in kern.log When running the tlog collapse policy, the monitoring agent repeatedly logs the previous named error in the kernel log. This error has no further impact on the system, nor do any events trigger because of this. N/A
AMIF-4078: MetaStore not marked as degraded when one of its nodes is unavailable When a node of a MetaStore is unavailable, the MetaStore should be marked as degraded. N/A
AMIF-4170: Failure of machine agent is not auto-corrected

In some rare occasions, it is possible that checking the availability of an agent fails. This might result in an agent that seems to be connected, but in reality is offline.

As a result, this can cause several agent scripts to fail because there is no agent available to execute these scripts. In normal situations, the monitoring should detect that an agent is unavailable and automatically restart it.

If you encounter this situation, restart the application server on the node whose agent is unavailable.
AMIF-4253: Health-check in a routed setup fails In a routed setup, there are ARP requests for MAC addresses for IP addresses outside your broadcast domain. You receive the MAC address of the router instead of the machine, which makes your health-check fail. N/A
AMIF-4426: I/O error on blockstore blocks checking other disks When the monitoring agent checks the blockstores of a node and the check on one of the blockstores/disks is caused by an I/O error, the monitoring agent doesn’t check the remaining blockstores/ disks. As a result, the other blockstores might remain unmonitored until the disk is decommissioned. N/A
AMIF-4481: Deleting a bucket with a huge amount of objects causes MetaStore lagging events

When you delete a bucket that contains millions of objects,there might be MetaStore lagging events raised. Deleting a bucket puts a high load on Arakoon, since this is a really fast operation.

Something minor could make the slave lag behind. You must monitor whether this event eventually disappears. If it doesn’t, contact Quantum Support.

N/A
AMIF-4609: Event “blockstore <path> is offline” for manually offlined blockstores When setting blockstores manually offline in the CMC, you will receive events for those blockstores that they are offline (event Lattus-MON-DSS-BLOCKSTORE-0033). N/A
AMIF-4637: A deleted unmanaged disk is added again during the next monitoring cycle When deleting an unmanaged disk from the UI, the monitoring cycle starts and adds the disk again as an unmanaged disk. This will be repeated until the disk is either physically removed or repurposed. N/A
AMIF-4710: Monitoring agent raises event for halted metastore during Arakoon recovery After an ungraceful shutdown of a controller, an Arakoon recovery is automatically initiated, during which time the Arakoon is temporarily halted. For this reason, the monitoring agent raises an event and tries to restart it, but the restart fails, resulting in another event (OBS-ARAKOON-0018). This last event can be confusing because it might give the impression that the recovery is not initiated. In this specific situation only, you can safely ignore the event. N/A
AMIF-4772: Origin of failed login attempt is unknown After a failed CMC login, an event is generated. The event is unable to log the source IP address from which the login has been attempted. N/A
AMIF-4843: Monitoring agent still uses old hostname of node When renaming a node’s hostname, the monitoring agent still uses the old hostname in events. This issue is resolved after restarting the monitoring agent.
AMIF-4879: Monitoring agent does not restart when its process is “defunct” In some situatations, the monitoring agent process might become defunct. In that case you can no longer restart the monitoring agent. Consult Quantum Technical Support and reference KB article BDY069 for help resolving this issue.
AMIF-5015: Mounted stale partition causes failures of monitoring agent rules When you have a stale partition which is mounted, the monitoring agent rule to check the blockstores fails with an unclear stacktrace. N/A
AMIF-5022: Monitoring agent in unknown state when kipmi0 process doesn’t respond When the kipmi0 process doesn’t respond, the monitoring agent also stops responding and goes into an unknown state. The monitoring agent only gets restarted manually or when the monitor policy runs. N/A
AMIF-5027: Monitoring agent does not detect “ixgbe module verification failed” message The monitoring agent does not detect the following error message in the kernel logs, and as such doesn’t raise an event: ixgbe: module verification failed: signature and/or required key is missing - tainting kernel. This error can be safely ignored. N/A
AMIF-5035: Wrong monitoring thresholds for free space on tlog partitions The monitoring agents checks the free space thresholds in a wrong way so that the critical threshold is always first reached when checking the free space on tlog partitions. As a result, a MetaStore will be set automatically to full before you are informed about reaching warning and error thresholds. N/A
AMIF-5074: Kernel dmesg errors detected when restarting an AC4 controller

When restarting an AC4 controller, you may receive events about the detection of kernel dmesg errors.

Event Type: OBS-PMACHINE-0019

Event Message: Kernel dmesg errors detected

Severity: ERROR

Source: LeffeController2 (00:25:90:99:27:11)

Occurrences: 1

First occurrence: 2015-01-10 20:20:55

Last occurrence: 2015-01-10 20:20:55

Details: sas: ata1: end_device-1:0: dev error handler sas: ata1: end_device-1:0: dev error handler ...

This event may be safely ignored.

N/A
AMIF-5080: Machine rebooted event when restarting the monitoring agent When you manually restart the monitoring agent on a node, you might receive events that the node has rebooted. This has no functional impact. N/A
AMIF-5107: Event message for OBS-STORAGE-0004 updated unexpectedly

When the event OBS-STORAGE-0004 is raised, the event message contains environment statistics, showing the number of:

  • degraded disks
  • decommissioned disks
  • offline disks
  • healthy disks

When this event occurs again within the dedupe period, the event message is overwritten and no longer shows the environment statistics.

N/A
AMIF-5489: Abandoned storage daemons are not skipped when verifying blacklists When the monitoring agent checks the blacklists, it doesn’t skip the abandoned storage daemons. This could result in very long monitoring cycles where the monitoring agent can be restarted automatically and an incomplete blacklist monitoring cycle. N/A
AMIF-5755: Log collector runs forever even when it’s not able to establish an SSH connection When the log collector is not able to establish an SSH connection, it keeps on running forever. N/A
AMIF-5807: The log collector doesn’t compress log files if run with -C and -Q options When you run the log collector using both the -C and -Q options, the log files are not compressed to a single log file due to a conflict between both options. N/A
DSS-1727: Deleted objects are taken into account for calculating object_name_length_stats When retrieving the statistics about the object’s name length in a namespace, the deleted objects are also taken into account. N/A
HARDWARE-1: ATA errors in dmesg of StorageNodes ( MV64460/64461/64462 System Controller, Revision B).

After installing or rebooting S20 Model 2/S30 storage nodes, the following events appeared in the CMC event log:

[ 43.568387] ata10: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [ 43.568392] ata10: status=0x01 { Error } [ 43.568394] ata10: error=0x04 { DriveStatusError }

These errors are harmless and don’t require any attention if they appear at the boot time of the node.

N/a

Arakoon Issues

Issue Description Workaround
AMIF-4730: Misleading event message for OBS-ARAKOON-0008 The event message for event OBS-ARAKOON-0008 might indicate a wrong machine name, but the node name and corresponding MetaStore are correct. N/A
AMIF-4848: Arakoon not correctly started without errors When Arakoon starts, it verifies its transaction log if it is not corrupt. If the log is very large, the verification might time out and Arakoon starts even if the log file was improperly closed. As a result, Arakoon is unable to recover upon its launch. N/A
ARA-97: Preferred master remains master after executing dropMaster When you have selected a preferred master in an Arakoon cluster and then execute a dropMaster, the selected preferred master remains the master of the cluster. N/A

Failover Issues

Issue Description Workaround
AMIF-3033: Failover cannot handle NIC ordering differences between controllers If the environment has controller nodes whose NICs are ordered in a way other than the management node, the failover will cause the new management network configurations to be incorrect. N/A
AMIF-4067: Machine reboot event incorrectly shown after management node failover When executing a management node failover, an event is generated saying that the new management node is rebooted. However, this new node has not been rebooted. N/A
AMIF-4173: Upgrade patches history is incomplete after executing a failover to a freshly installed node

After executing a failover to a new management node, the upgrade patches history is lost when checking the About section in the Cloud Management Center.

This is because after a failover there are no patches available on the new management node. This has no impact on Lattus functionality.

N/A
AMIF-4365: During a failover, too many messages appear when installing PostgreSQL and Apache packages A failover is typically executed in a screen session. When the failover installs the PostgreSQL and Apache packages, an overload of messages appear N/A
AMIF-4402: During failover, it is possible that Pound is restarted on the old node before it gets started on the new node

During a failover, Pound is stopped on the old management node. The monitoring agent might detect this action and initiate a restart of Pound on this old node.

If this happens, the failover will fail because Pound cannot be started on the new node.

If you encounter this issue you can disable TLS and restart the failover. After the failover completes, you can enable TLS again.
AMIF-4838: Apache still running on old management node after failover After executing a failover while the old management node is running, Apache is still running on the old management node even after the failover completes successfully. N/A
AMIF-5064: Failover fails when nodes are not reachable on the management network When you have nodes that are not available in the management network and you perform a failover, this action fails, even when the nodes are available in the storage and/or public network. In the model, the old management node remains the management node. N/A
AMIF-5066: Custom SNMP polling ‘community string’ lost after failover When you perform a failover, the community string for custom SNMP polling is reset to the default value. N/A
AMIF-5084: Failover might fail when executed immediately after adding new nodes When executing a failover immediately after adding a new node, the failover might fail. This could be caused by the fact that the model was not yet backed up to Arakoon. N/A
AMIF-5144: After a failover, rscripts time out because the agent cannot report status When performing a failover, it's possible that an agent executes rscripts, but is unable to report the status of this execution. In that case, the rscripts time out, together with all subsequent rscripts. N/A

Supportability Issues

Issue Description Workaround
AMIF-258: The workflow engine is not automatically restarted when it is stopped or killed Automatic application restart requires both Apache and the workflow engine to be running. If they are not running, they cannot be restarted automatically. Quantum recommends monitoring these applications with an external tool. This way downtime doesn’t go unnoticed. N/A
AMIF-892: Root file system can fill up when a lot of core files are created within a period of one week Core files are compressed, and those older than seven days are removed. But if a component produces a lot of core files within this seven day period, the root file system could fill up. Events are raised when core files are detected. N/A
DSS-944: ReadCache clusters don’t have multi-IP support This implies that in the event of a single switch failure, the readcache-clusters will not be functional (in the event the switch that is being used for the readcache traffic dies). N/A

Pound Issues

Issue Description Workaround
AMIF-2832: Pound might produce unexpected messages in its log

The following message might be logged multiple times in the Pound log file:

localhost pound: NULL get_thr_arg

This is a Pound-specific issue, but has no functional impact on your system. The logging of this message will be removed in a future release of Pound.

N/A
AMIF-3843: Unnecessary restart of Pound service when adding or removing storage LANs When adding or removing storage LANs, Pound is restarted while its configuration is not updated. Because the configuration is not updated, the restart is obsolete. N/A
AMIF-4416: Pound application listed as ACTIVE on old management node after failover After a failover it is possible that, on the old management node, the Pound application is still listed as Active, due to a too late model update of the Pound application. N/A

Upgrade Issues

Issue Description Workaround
AMIF-3135: PXE booting new nodes doesn’t work after Apache was restarted gracefully

When the environment is upgraded to Lattus 3.1 and https is enabled, Apache is restarted gracefully (not a clean restart).

In some instances this graceful restart doesn’t work. In this case, you will not be able to install new nodes over PXE.

To work around this issue, restart Apache on the management node via Q-Shell: q.manage.apache.restart()
AMIF-3139: Initialize logrotate actor fails when having multiple agent connections. Initializing the log rotation on a node can fail. Thi is because the upgrade can cause a node to have two agent connections, which causes the init logrotate script to be executed twice. This can cause a failure. Clean up the double agent connections and restart the upgrade.
AMIF-4541: Error “Fault 8002 (DataError) invalid input syntax for uuid” during upgrade

During the upgrade to Lattus 3.4.4, it is possible that you receive the error:

Fault 8002: '(DataError) invalid input syntax for uuid: "None" LINE 1: select * from only lan.main where guid='None'

If you receive this error, restart the application server and restart the upgrade with the last upgrade command that you used.
AMIF-4671: Kernel dmesg errors after upgrade

After an upgrade, you might receive e-mails for a kernel dmesg error:

Kernel dmesg errors detected

...

sas: sas_eh_handle_sas_errors: task 0xffff8810502df5c0 is done

...

Investigate the root cause of the kernel dmesg errors

This has no impact on Lattus functionality.

N/A
AMIF-4779: Duplicate disks in the model after an upgrade When you upgrade a machine with empty disks to 3.5.x, the empty disk appears twice. The disk was already known by the system before the upgrade, but due to a new bus location format from 3.5.x onwards, the system detects the disk again and adds it as new empty disk. This might lead to failures when you use the duplicate disk to expand MetaStores. N/A
AMIF-4992: Degraded disks after upgrading from 3.4.x When you upgrade from Lattus 3.4.x to 3.5.x or 3.6.x, disks might become degraded because the monitoring agent cannot detect the disks. This can be due to a timing issue between the installation of the library (framework upgrade) and the package of the utility (post-upgrade action) which gathers the disk information. Inbetween this short period, the monitoring agent tries to gather the disk information, resulting in degraded disks. N/A
AU-264: DSS upgrade fails if CONFIGURED nodes exist in the environment Upgrading an environment that contains nodes in state ‘CONFIGURED’ will fail. These nodes must be removed using q.amplistor.cleanupMachine().
AU-411: Upgrading DSS doesn’t show correct logging on screen

When upgrading from Lattus 3.1.0 to 3.2.1, during the DSS upgrade several osis debug/info messages are shown when they should not be displayed.

On the other hand, a message notifying you that it will reboot the machine is not displayed when it SHOULD be.

N/A
AU-424: Upgrade fails while configuring packages with a permission denied error using Amplisys account Upgrading from 3.1.4 to 3.2.1 using patch “3.2.1_j “on a system where the root account is disabled and the amplisys account is introduced will fail while configuring packages . N/A
AU-426: Improve branding error message.

When trying to install a branded upgrade patch on a Himalaya installation, the following message is shown: > Checking product branding ERROR: Unable to install package '<branded name>_3.2.1_j', because 'branding_amplidata' is installed

This message should be “Cannot install <branded name> upgrade on Himalaya machines”.

N/A
AU-444: Kernel modules upgrade marker is not found after a manual reboot for a controller

When upgrading from a Lattus 3.1 environment, new networking drivers are installed. The storage nodes are rebooted automatically, but the controllers must be rebooted by the customer.

A warning message is displayed about this, but even when the upgrade script is rerun and the customer has rebooted the controllers already, the message is still displayed.

N/A
AU-462: Upgrade failed while moving logs because logrotate renamed them If the upgrade is started during the same time the log rotate happens, the upgrade will fail. N/A
AU-474: Upgrade fails with application server not running exception   Running the upgrade again solves this issue.
AU-476: Upgrade fails while configuring agent controller   Running the upgrade again solves this issue.
AU-477: Upgrade fails with a proxy error because application server was not ready   Running the upgrade again solves this issue.
AU-478: After installing the upgrade, events about core files for multipathd are seen

During the upgrade, the “multipathd” package is installed, but also immediately disabled.

During this small timeframe the package can still get started and issue core file and dmesg errors. These errors are harmless and can be ignored, but they will keep triggering events repeatedly as long as they exist.

Once the upgrade is complete, log into the management node and remove these core files. They are located in /var/crash/.
AU-556: The reporting namespace still exists after an upgrade from Lattus 3.0 From Lattus 3.1 onwards, the reporting function is no longer available, and therefore the reporting namespace is obsolete. When you upgrade from Lattus 3.0 to version 3.1 or later, the reporting namespace is not removed. N/A
AU-648: Lattus 3.4.0_q upgrade might fail Executing the upgrade patch 3.4.0_q might fail due to an issue with the OSIS component. The following lines are found in the log file: INFO:osis.model.thrift:Unknown field None (id 57) Restart the upgrade to complete the upgrade.
AU-661: Installation of patch version v might fail because the monitoring agent cannot be stopped The installation of the upgrade patch version “v” might fail because the monitoring agent cannot be stopped correctly. Restart the installation of this patch to continue.
AU-663: Upgrade to 3.4.0 continues when there are nodes queued for initialization When there are nodes in the initialization queue, the upgrade continues where it shouldn’t. These nodes will not receive the 3.4.0 kernel upgrade. N/A
AU-683: Lagging MetaStore events during upgrade to 3.3.x When upgrading from Lattus 3.2.x to 3.3.x, you might receive events about lagging MetaStore nodes, and the administrator might receive emails about this event. This is because a new version of Arakoon is installed but not running on all nodes at the same time. When the majority of MetaStore nodes is still running the old version, these events occur. Once the majority of nodes is upgraded to the new version, there are no more lagging events. To avoid encountering this issue, upgrade the majority of nodes to the new version of Arakoon.
AU-723: Restarting a node from the CMC might seem to fail After upgrading, you must restart all controller nodes. When you restart the controllers via the CMC, some restarts might fail, where other restarts succeed. N/A
AU-724: Message generated that storage node restart is required where it is not mandatory When upgrading to 3.6.0, a message on the storage nodes is generated saying that a restart is required. This message is misleading because the restart is not mandatory, although the restart does not affect the upgrade. N/A
AU-743: Alarm messages about storage daemon RRD During the upgrade to 3.5.0, a lot of e-mails are generated describing issues regarding the update of space usage. This is caused because the monitoring agent on storage nodes is updated, but not on the DSS backend. As a result, blockstore information cannot be retrieved and the RRD files cannot be updated. The mails are sent until DSS is fully upgraded. N/A
AU-821: Old kernel images are not removed after upgrade When a new kernel is installed during an upgrade, older kernel images are not removed. This can be seen in the upgrade log, but has no functionality impact on the system. N/A
(Quantum) 55832

To properly identify the controllers in Lattus, there is a procedure required to import the proper HAL type so the CMC can properly identify the controller type. When the procedure--hal_tools.identify_hardware()-- runs, it produces errors.

Although errors are present, the procedure completes and the HAL type is imported into the model.

The runtime errors are harmless and can be ignored.

N/A

Miscellaneous Issues

Issue Description Workaround
AMIF-2985: When changing the admin password, Lattus also tries changing it on a FAILED machine If you change the admin password of your environment while it has a failed machine, Lattus will try (and fail) to change the password on the failed machine as well. The password change will succeed on your environment, but will also display a pop-up window that it failed on the failed machine. N/A
AMIF-3643: MetaStore does not start due to checksum error of TLOG After power cycling all controller nodes, a MetaStore might not start due to a checksum error of the TLOG. This issue can occur when using the REST VM, which should be used only for testing and development purposes. N/A
AMIF-3645: After a power cycle, Lattus doesn’t automatically resume operations because pid files are lingering When a controller node (and more specifically the management node) is power-cycled (i.e. rebooted in an uncontrolled fashion); upon restart some of the pid files (used to prevent that multiple instances of the same process get started) are not cleaned up, preventing the restart. The workaround is to identify the process that refuses to start and to remove its pid file.
AMIF-3689: Rebooting a data center fails to reboot the management node because of a timeout The time-out on the script is lower than the time it takes to stop all services on the management node. There is no real functionality impact, since the stop commands are executed in a different process than the process of this time-out. N/A
AMIF-3782: Log collector can return unsorted results when specifying a time window When you specify a time window, the log collector returns unsorted results. You might also lose log data when you specify time keywords. N/A
AMIF-3841: Security attributes remain on disk, but the secure erase job does not fail It can occur that a secure erasure of a disk fails, leaving the security attributes on the disk. Sometimes the framework does not detect this failure which leaves the disk unusable. N/A
AMIF-3989: After changing the password of the cloudapi user, login failure events are generated It is possible to change the password of the system user “cloudapi” through the CMC. When you do so, you will receive plenty of events “Failed login attempt with username cloudapi”. This is because of the application server keeping the old password in its cache. Restart the application server after updating the password.
AMIF-4027: Realtime statistics in OSMI doesn’t show timeout range for the collection process When you want to collect realtime statistics in OSMI, you can only provide a timeout value in seconds in the range between 120 and 100,000. If the user enters a value outside this range he gets an error but it doesn’t show the range in which his value must be situated. N/A
AMIF-4219: “Program lshw tried to access /dev/mem between ff000->101000”” in kernel log It might occur that the program lshw tries to read information from memory contents. When doing so, a warning “Program lshw tried to access /dev/mem between ff000->101000” is added to the kernel log. This warning has no further impact on your system. N/A
AMIF-4273: Machine cleanup shows error because it cannot be removed from the initialization queue The cleanup of a machine tries to remove the machine from the initialization queue. When the machine is not in this queue, this step fails as expected, but the cleanup continues and completes successfully. N/A
AMIF-4277: When a MetaStore and a ReadCacheCluster are installed on the same physical SSD, events are shown When you create a ReadCacheCluster on a physical SSD which already has a MetaStore, you can see “lagging behind” events. Event type: Lattus-MON-ARAKOON-0008 Event message: Node <node name> on MetaStore env_metastore is lagging keys on machine <machine name> N/A
AMIF-4371: In a routed setup, the “Location LED” functionality doesn’t work In a routed setup, the “Location LED” functionality of the CMC doesn’t work. This function is used to easily identify a server in a rack by on turning an LED. N/A
AMIF-4384: Updating the network routes fails when there configured nodes

When you have nodes in a “Configured” status, the job to update the network routes fails. However, this has no impact on the running nodes. These are updated correctly with the new network routes.

Clean up the configured machines with in the CMC, the configured nodes appear under Dashboard > Administration > Hardware > Servers > Unmanaged Devices > Failed.

Select the nodes and click Cleanup Devices from the Commands pane.

AMIF-4448: Shutting down 3-site multi-geo site fails when shutting down one data center at a time When you shut down a 3-GEO Lattus by shutting down one data center at a time, the shutdown succeeds for the first data center, but the second fails, indicating that there is no longer an Arakoon master. N/A
AMIF-4515: Decommissioning a disk fails due to degraded RAID Decommissioning a disk might fail with an unclean “KeyError” instead of mentioning that a RAID is degraded and missing a healthy member. This is caused by bad parsing of RAID member information. N/A
AMIF-4534: Arakoon recovery might leave .part files When an Arakoon recovery is interrupted between creating .part files and uploading them, the .part files remain on the file system instead of being removed. N/A
AMIF-4537: DNS update only works on first LAN When you have multiple public LANs, the update of DNS will works only on the first configured LAN. If you update the DNS of another public LAN via the CMC, the job succeeds, but the update is not persisted on the system. N/A
AMIF-4564: Possible to repurpose a disk while it is being initialized When you repurpose a disk, it is being initialized. This disk remains in the list of unmanaged disks until the disk is fully initialized. During the initialization time, you are able to start another repurpose of the same disk. N/A
AMIF-4611: PDF export of disks might fail in Google Chrome browser When using Google Chrome as a browser, the export of decommissioned disk details might fail. This issue has been identified in Google Chrome 34.0.1847.116-1 but might not be limited to this version. N/A
AMIF-4628: Health check fails due to decommissioned storage daemons During the health check, you are asked to start a storage daemon in case the health check finds a storage daemon that is not running. If you answer No because you decommissioned the storage daemon, the health check fails unexpectedly. N/A
AMIF-4646: Kernel dmesg events with empty call trace after power cycle When restarting a controller, you might receive kernel dmesg events with an empty call trace. N/A
AMIF-4654: Under load, hot swap might fail due to unmounting disk failure When a system is under load, the hotswap of a disk might fail due to an unmount failure. This results in a disk with a failed disk initialization. The workaround for this issue is to manually unmount the disk.
AMIF-4655: Hotswap of SSD not possible if it contains a cache daemon It is not possible to hotswap an SSD that contains a cache daemon, because the daemon is not stopped. The cache daemon must be halted before you can hotswap the SSD. N/A
AMIF-4662: Healthcheck fails on TFTP When running the healthcheck, the procedure might fail when checking TFTP. This is due to writing multiple PIDs in the PID-file. N/A
AMIF-4752: Root password saved in Q-Shell cluster configuration file When you create a Q-Shell cluster, its configuration file stores the root password. You should remove the Q-Shell cluster after executing the necessary actions on the cluster.
AMIF-4753: Q-Shell clusters don’t work when enabling amplisys user and disabling root password When you enable the amplisys user and disable the root password, you are no longer able to use Q-Shell clusters. A Q-Shell cluster is used, for example in the log collector. N/A
AMIF-4756: Events are not branded All events have a code starting with "AMPLISTOR" instead of "LATTUS". N/A
AMIF-4771: Decommissioning a disk when agent is unavailable set blockstore as decommissioned

When you decommission a disk on a node when there’s no agent available, Lattus sets the blockstore to decommissioned, but the disk is not decommissioned.

This is because there are first local updates which are not reset when the job fails.

Decommission the disk again when the agent is back and running.
AMIF-4871: Rscript cannot be killed when node is unreachable When a node becomes unreachable while an rscript is being executed on the node, the rscript hangs and cannot be killed. The rscript cannot be re-executed and might have some impact on Lattu functionality. For example, when retrieving aggregate graphics, they will not include the unreachable node’s graphics. N/A
AMIF-5040: During startup of controllers or storage nodes there is a recommendation to use a different driver When starting a storage node or controller the dmesg may show the following recommendation: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver This message can be safely ignored. N/A
AMIF-5058: Original serial number of disk not restored when initialize_new_disk fails When initializing a new disk fails, the serial number of the original disk must be restored, but it isn’t, which might be confusing for the administrator. The ID of the disk in the model is the ID of the old disk, but the serial number is the one from the new disk. N/A
AMIF-5075: ACPI error when starting an HP DL360 controller

When starting the HP DL360 controller, you may see the following ACPI error:

[11.100081] ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, 32 (20130517/exfield-299) [11.100086] ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMM] (Node ffff8808544af9d8), AE_AML_BUFFER_LIMIT (20130517/psparse-536) [11.100093] ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20130517/power_meter-

This error may be safely ignored. Note that the HP DL360 is not yet fully certified for Lattus 3.6.0.

N/A
AMIF-5076: A failed restart of a node sets the status of the node to RUNNING When restarting a node fails, the node status is still set to RUNNING instead of the expected status HALTED. N/A
AMIF-5180: Health check might fail when the client daemon is under heavy load One of the last phases of the health check is the file upload. If a client daemon is under heavy load, it is possible that it doesn’t handle this upload request on time, resulting in a failed upload and hence a failed health-check. N/A
AMIF-5539: Uploading log files might time out on systems with high load When your system is under high load, it is possible for uploading log files to time out. This issue is caused by a timout set to 600 seconds. N/A
AMIF-5846: Services are not started on storage nodes if “lo” device is not available It it possible for the “lo” device to become not available when starting a storage node due to an outdated “ifupdown” package. If this occurs, services on the storage node are not started, leaving the node unavailable. N/A
DSS-1080: Abandoning a blockstore with force flag fails If a storage node is hanging and it still responds to a ping, but not to ssh connections, abandoning blockstores and storage daemons will fail on that node. N/A
DSS-1440: Possible to create a namespace on a full MetaStore When you mark a MetaStore as full in the Q-Shell, you can still create a namespace (also in the Q-Shell) with that MetaStore if you explicitly specify the MetaStore. However, you are not able to add any data to the new namespace. N/A
DSS-1570: When using a range that starts at the boundary of a superblock, a “bug” appears in the log files

When using a range that starts at the boundary of a superblock, an error with a “Bug” message is added to the log file:

"Bug: this should not happen: obtained an empty range...

This message will be rephrased in a later release.

N/A
DSS-1671: No difference between accessing objects by forbidden users and unsatisfied headers

If you want to bill anonymous users for failed requests, you are not able to differentiate between:

  • “Forbidden” users trying to access objects without the proper permmissions on the bucket; and
  • Users whose request fails due to unsatisfied headers on anonymous buckets
N/A
DSS-1934: Dead Man’s Timer not implemented for S3 protocol The Dead Man’s Timer is available only for the AXR protocol, and not for the S3 protocol. N/A
QUANTUM-1378:  Decommissioning and replacing more than one storage node at a time does not work properly. Auto decommissioning fails, and decommissioning and replacing more than one storage node at a time can be problematic when storage capacity is over 70% full. When a storage node is being decommissioned and replaced, make sure the decommissioning for that node is fully complete before proceeding to the next storage node you want to decommission and replace.
Non-Critical Third-Party Vulnerabilities

Vulnerabilities have been detected in some of the third-party packages used by Lattus.

The third-party patches for these issues will be included in future Lattus releases.

N/A
QUANTUM-858, AMIF-4897, QUANTUM-1087: Page allocation failures with multiple kernel dmesg errors

Page allocation failure occurred along with multiple error messages:

“Kernel dmesg errors detected”

These messages can be safely ignored.

N/A
QUANTUM-1481 (Bugzilla 67002): DSET report not providing data.

In Lattus 3.6.6 and later releases, the DSET report might be missing data.

To work around this issue, load the IPMI drivers manually so you can do a full DSET collection without having to reboot or interrupt data traffic. Run the following commands:

modprobe ipmi_si
modprobe ipmi_devintf

Now run the following commands to add the drivers to /etc/modules so they'll be automatically started during subsequent reboots:

echo ipmi_si >> /etc/modules
echo ipmi_devintf >> /etc/modules

After loading the drivers, follow the instructions in "Collecting Hardware Information Using DSET" in the Lattus Service Reference Guide as you normally would.

(No Issue Number) Halted Maintenance Agent events generated during upgrade

Halted Maintenance Agent events might be generated during an upgrade.

Example Error Event: Maintenance agent '419124544' has status halted

After a successful upgrade, you can ignore these false-positive events during three monitoring cycles. After these three cycles, an event becomes an actual system event.

N/A

Contacting Quantum

More information about this product is available on the Service and Support website at http://www.quantum.com/ServiceandSupport/Index.aspx. The Service and Support Website contains a collection of information, including answers to frequently asked questions (FAQs). You can also access software, firmware, and drivers through this site.

For further assistance, or if training is desired, contact the Quantum Customer Support Center:

United States

1-800-284-5101 (toll free)

+1-720-249-5700

EMEA

+800-7826-8888 (toll free)

+49-6131-3241-1164
APAC

+800-7826-8887 (toll free)

+603-7953-3010

For worldwide support:

http://www.quantum.com/ServiceandSupport/Index.aspx