Tools > Storage Manager > Alternate Store and Retrieval Location
Task | Description |
---|---|
Alternate Retrieval Location |
Allows you to specify a remote retrieval location to use in situations where files stored on tape or a storage disk cannot be accessed. |
Alternate Store Location |
Provides an automatic system for copying files from a main instance of StorNext to a remote instance of StorNext at the same time as copies are made to tertiary storage at the main site. |
The StorNext Alternate Store Location feature provides an automatic system for copying files from a main instance of StorNext to a remote instance of StorNext at the same time as copies are made to tertiary storage at the main site. The remote copies can serve as a copy-of-last resort with the Alternate Retrieval Location feature in StorNext. The feature also supports background copying of files that existed before the deployment of the feature. Background copy activity can be limited to avoid overwhelming the use of the StorNext system for new files.
Note: This feature applies only to managed file systems that have at least one configured policy class.
The feature can be enabled per Storage Manager Class Policy per file system. After it is enabled, new files are automatically copied to the remote site. Pre-existing files can also be enabled for automatic copying by providing them as input to the altstoreadd
command. On completion of the copy, the completion is recorded per file so that the completion status can be displayed with the fsfileinfo
command. Prior to completion of the copy action, the fsfileinfo
command reports if a file is enabled for copying, and reports that the copy action has not been performed.
After the copy action has been performed, the main-site instance of StorNext does not maintain a status of the remote-site copies. Ensuring that the remote-site copies are maintained as a read-only image of the main site is an administrative responsibility. Necessary actions to remedy any discrepancies that can accumulate at the remote site as a result of deletions or modifications caused by either user activities or equipment failure are also administrative responsibilities.
Note: Zero-length files are not stored by Storage Manager class policies; since the Alternate Store Location feature depends on policy processing of files, zero-length files are not copied to the remote location.
In situations where file retrieval fails because the normal file copies cannot be retrieved from the machine on which StorNext Storage Manager resides, the Alternate Retrieval Location feature enables you to retrieve a copy of the truncated file from a different machine. (Both machines must be using the same operating system.
For example, if StorNext creates two copies of each file, when retrieving a truncated file StorNext tries to retrieve Copy One and then Copy Two. If neither of these copies can be retrieved and this feature is not enabled, the retrieval fails. However, if this feature is enabled for the file system, after retrieving Copy Two fails Storage Manger tries to retrieve the file from the alternate machine you specified during feature setup. Because the file already exists in the StorNext file system, it retains the permissions it already has. No permissions are changed based on the file on the alternate machine.
Note: This feature applies only to managed file systems that have at least one configured policy class.
For this feature to work correctly, it is your responsibility to make sure all files you might want to retrieve are copied to the alternate machine. Otherwise retrieval will fail when StorNext attempts to retrieve the file from the alternate location and cannot find the file.
-
Review and perform the procedures in the following topics:
- On the Tools menu, click Storage Manager, and then click Alternate Store Location or Alternate Retrieval Location.
-
At the field Remote Node IP/Host Name field, enter either the IP address or the host name of the remote server from which you would like to retrieve data.
-
Select Enable to activate the Alternate Store Location or Alternate Retrieval Location feature.
-
At the field under the Remote Path heading, enter the directory path for the remote node (server).
-
Click Apply to save your changes, or Cancel to exit without saving.
-
When the confirmation message appears, click Yes to proceed or No to abort.
-
After a message informs you that the Alternate Store Location or Alternate Retrieval Location was successfully added, click OK.
The StorNext Alternate Store Location feature runs an agent on both the main site and the remote site, which must be at the same StorNext release level. Each site can serve as an Alternate Store Location main site, remote site, or both at the same time. One StorNext instance can serve as the remote site for multiple main-site StorNext instances. Each main-site file system must have a distinct remote-site file system that cannot be shared with any other main site. A main site can only send copies to a single remote site.
The fs_altstore
resident process of TSM provides both the main site and the remote site processing. It starts and stops with TSM, runs in an idle mode when it is not configured as a main site, and is always ready to serve as a remote site. Communication between the resident processes on the main and remote sites is with a TCP/IP socket connection to a designated port. The altstore
service can be controlled with the fsschedlock
utility.
Transferring of files between the main site and the remote site is done with a user-modifiable script. The default script provided with StorNext uses the Secure Shell scp
command for maximum security. Other transfer schemes, such as, FTP or NFS can be substituted according to the security and performance requirements at customer sites. Files are transferred to a staging location at the mount point for each remote file system that stores remote copies. The name of the staging folder is .AltStoreStagedir
under the mount point of the file system. Files are moved or copied from there to each file’s target pathname. The per-file owner, group and rwx permissions are set to match their values at the main site. Files are read on the main site with root permission. They are sent to the staging directory with the altstore
user’s permissions. The remote agent does the final actions with root permission.
When a file is created under a policy with Alternate Store Location enabled, the file is added to a database table as a demand type request for remote copying. Files that were created before the feature was configured can be added to the table as a background type request for remote copying. All of the demand-type requests have priority and are processed before any background-type request is processed. There is a configurable limit on the number of simultaneous demand-copy transfer processes that can be started. There is a separate configurable limit on the subset of those transfer processes that can be used for background copies when they are not being used for demand copies. Experimenting with these limits can help to tune your system so that throughput is maximized while keeping system responsiveness at an appropriate level for other activities.
Truncated files that are enabled for copying and are in the database table of requests are automatically retrieved. The truncation status is checked at the point of calling the transfer script for the file. These files are changed to the background request type and placed on an internal queue for retrieval. This queue is allowed to grow to a maximum size for a limited amount of time to allow the retrieve operation to optimize the order of retrievals within a single request. The queue-size parameter is described in ALTSTORE_RTRV_WAIT_COUNT, and the parameter for time is described in ALTSTORE_RTRV_WAIT_TIME. When a large number of new demand-type requests get created, these can cause the retrieve-queue to be emptied to give priority to the demand requests. Eventually, the truncated background-type requests will be reloaded and retrieved. When they have been retrieved, they are converted to demand-type requests for expedient copying to avoid being re-truncated before that can be completed.
When a file is enabled for copying, but has not been copied yet, manual requests for truncation with the fsrmdiskcopy
command will state that “not all copies…are stored”, and will reject the truncation request. The retention policy for immediate file cleanup will not apply when the remote copy has not been made. However, truncation-policy processing that depends on file-system fill levels can truncate files that do not have a remote copy.
The intent of the Alternate Store Location feature is to create copies of file systems on a remote StorNext system with matching pathnames, user and group owners, and simple permissions (rwx). However, the feature is enabled in policy classes, which are applied to folders within file systems, and the mount points for the top-level directories of file systems can have different pathnames on the main and remote StorNext systems. Displayed file and directory owner and group names will be identical if the mappings of user and group names to UID and GID values are the same on both systems.
The indication in fsfileinfo
output that a file copy exists is simply a record that a file copy completed successfully. Once a copy has been made, the Alternate Store Location feature does not perform any further tracking of the remote copy at the main site.
It is possible for a file to be marked for copying, but not to have a row in the database table to cause the copy to be performed. In that event, periodic rebuild processing will discover the discrepancy and will add a row to the table for the file to be copied.
The fs_altstore
resident process is running whenever TSM is running. It is visible in process listings. There are no command-line parameters, and the process must not be run by hand. Its configurable options are controlled by the following editable settings as described in the/usr/adic/TSM/config/fs_sysparm.README
file.
ALTSTORE_AGENT_PORT_NUMBER
Communication between the altstore
processes running on the main and remote sites is through a TCP socket on the designated port number 12333. This port number must be allowed to be passed through any firewalls between the two sites. This value must be identical on both main and remote sites.
Default: 12333
ALTSTORE_MAX_CONNECTIONS
This specifies the maximum number of connections between a remote-site Altstore daemon and all of the main-site Altstore daemons it serves. For Altstore daemons acting as a remote-site, this value should be equal to or greater than the sum of the values (ALTSTORE_NUM_TXFR_STREAM
+ 1) for each main-site. For Altstore daemons acting as a main-site only, the default value is recommended.
Default: (ALTSTORE_NUM_TXFR_STREAM
+ 1) or 31, whichever is larger.
ALTSTORE_NUM_BG_TXFR_STREAM
This specifies the number of simultaneous transfer streams for background transfers in the Alternate Store Location feature. This is the maximum number of transfer streams that can be used for transferring background files when the streams are not being used for on-demand transfers. This limit ensures that on-demand transfers can be responsive when new transfers are needed. This number must be less than or equal to ALTSTORE_NUM_TXFR_STREAM
.
Default: 2. The valid range is 1 through 30.
ALTSTORE_NUM_SCRATCHPOOL
This is the number of internal working-queue elements. It should be about 40 times the number of simultaneous transfer streams specified by ALTSTORE_NUM_TXFR_STREAM
.
Default: 200. The valid range is 10 through 3000.
ALTSTORE_NUM_TXFR_STREAM
This specifies the number of simultaneous transfer streams for on-demand transfers in the Alternate Store Location feature. The optimal number depends on the number of CPU cores in the server computer, the capacity of the network hardware, the ability of the destination computer to handle traffic, and the contention with other services using the network. Experimentation can help to characterize system throughput and determine when this number reaches diminishing returns of total throughput.
Default: 5. The valid range is 1 through 30.
ALTSTORE_POLL_TIME
Time to wait (in seconds) before the altstore
daemon wakes up to check for alternate store location candidates. The default value is 60 seconds. The valid range is 60 to 300 seconds. A longer value reduces the system resources consumed. A shorter value reduces the delay for the system to recognize the start or end of a lockout period as described in section: fsschedlock
.
Default: 60
ALTSTORE_RTRV_WAIT_COUNT
When retrieving truncated files, the Alternate Store Location feature will wait for this number of retrievable files to accumulate if the ALTSTORE_RTRV_WAIT_TIME
limit does not expire first.
Default: 10. The valid range is 1 through 30.
ALTSTORE_RTRV_WAIT_TIME
When retrieving truncated files, the Alternate Store Location feature will wait up to this many seconds for a batch of retrievable files to accumulate if the ALTSTORE_RTRV_WAIT_COUNT
is not reached first.
Default: 30. The valid range is 1 through 900.
ALTSTORE_TXFR_SCRIPT
Pathname of the customizable script for transferring files with the Alternate Store Location feature.
Default: /usr/adic/TSM/bin/fs_altstore_transfer
ALTSTORE_TXFR_USERNAME
The altstore
user ID owns the staging directory on the remote site and is used as the target ID for file transfers. The value must be identical on both main and remote sites.
Default: altstore
ALTSTORE_VERSIONING
This value is configured on the source MDC and controls the behavior that occurs on the Alternate Store Location target MDC for existing files when they are being updated. The 'y' option causes the file to be overwritten, which creates another Storage Manager version of the file on the target MDC. This option incurs higher CPU overhead. The 'n' option causes the file to be re-created, which creates the first Storage Manager version of a new file on the target MDC. This option incurs lower CPU overhead.
Default: y
WARNING: Use of the fsversion
command to retrieve a different version of a file on either the main or remote site can result in a discrepancy between the sites. Use the fsversion
feature with caution. When it is necessary to restore a main-site file to an older version, the main- and remote-site files can be kept in sync by rewriting a portion of the file on the main site, which will cause it to be stored as a new version on both the main and remote sites.
ALTSTORE_VERSIONING_TIMEOUT_FACTOR
This value is configured on the main-site MDC and controls the amount of time that is allowed for file overwrites to occur on the remote-site MDC before being considered an error. This value is only applicable when ALTSTORE_VERSIONING
is set to 'y
'. The versioning timeout factor is specified in seconds per gigabyte of data. This value should be increased if file overwrites are timing out.
Default: 60. The valid range is 30 through 600.
Quantum provides a default transfer script that uses SCP for the highest level of security. The choice of file-transfer technologies is dependent on local considerations for performance and security.
When the Alternate Store Location feature is configured, the same script is used for the Alternate Retrieval Location feature. Its two modes (store and retrieve) are distinguished by the value of the fourth parameter, which is set to ALTSTORE
for the Alternate Store Location feature.
When the transfer script is in ALTSTORE
mode, it uses the root
user ID at the main site for sending files since it must be able to read all users’ files, and uses the altstore
user ID to write files into the remote site’s staging directory. When the transfer script is also used for the Alternate Retrieval Location feature, it must use root
privileges on the remote site as well to be able to read any file.
Instructions are provided inside the script for creating the altstore
user and setting up Secure Shell (SSH) credentials for running the script. To test the credentials, while running as root on the main site, you should be able to use the scp
command to copy a file to the altstore
user’s home directory on the remote site.
When both the Alternate Store Location feature and the Alternate Retrieval Location feature are configured, do not create the altstore
user on the remote site. Instead, set up SSH credentials for the root user to access the remote site as root
, and change the value of ALTSTORE_TXFR_USERNAME
to root in the /usr/adic/TSM/config/fs_sysparm_override
file.
If you prefer to use a different script, create a new executable file under a new name so your custom script is not overwritten when StorNext is upgraded. Set the ALTSTORE_TXFR_SCRIPT
parameter in the /usr/adic/TSM/config/fs_sysparm_override
file to the pathname of the executable. The Alternate Store Location parameters for the executable are as follows:
- Main-site full pathname of the file to transfer.
- Remote site IP address or Host Name (as configured with
fsaltnode
). - Remote-site full pathname of the file in the staging area.
- The word “
ALTSTORE
”, which indicates that the executable is being invoked for stores. When the parameter is missing, the executable is being invoked for retrieves.
The following commands are for operating and administering the Alternate Store Location feature. These are described in greater detail in their respective man pages.
Enables the Alternate Store Location remote-copy feature on files and adds them to the alternate store candidate list. It can also be used to verify that a remote copy has been made for a file or list of files.
Display or manipulate the Alternate Store Location feature’s alternate store candidate list.
Add, modify or delete Alternate Retrieval Location information in the Tertiary Manager database. The settings made with this command also apply to Alternate Store Location. The information from this command can also be set and displayed by the StorNext GUI as demonstrated in Alternate Retrieval Location Configuration.
Generate a report about files known to the Tertiary Manager. A field in the output describes if the Alternate Store Location feature is enabled and whether the copy action is pending or has been completed. Information from this command can also be displayed by the StorNext GUI as demonstrated in Sample File Report.
Modify the class attributes of a file. This can be used to disable or enable copying on a per-file basis.
Report policy-class processing parameters, associated directory paths, and affinity lists. This command can display the alternate store location state per policy class. Information from this command can also be displayed by the StorNext GUI.
Command used for locking/unlocking some automated features. The altstore
feature type controls the alternate store location feature. There may be a delay of a few minutes after the lockout schedule is reached before the fs_altstore
receives the command, completes in-process transfers, and stops processing new transfers.
Modify the processing parameters of a policy class. This command can change the alternate store location state per class policy. This information from this command can also be set and displayed by the StorNext GUI.
The -r
option to fsrmcopy
invalidates the remote copy, if it exists, and a new copy is created.
Log, trace, and history information from the fs_altstore
resident process is recorded under the /usr/adic/TSM/logs
directory in the trace/trace_06
, tac/tac_00
files, and in the /usr/adic/TSM/history/hist_06
file.
The Alternate Store Location feature is part of the StorNext system and does not require any extra steps to install. When it is not configured, it runs with negligible impact on system resources. After configuration, it is ready to provide both main- and remote-side functionality of the Alternate Store Location feature.
Following are the steps for configuring both sides of the Alternate Store Location feature:
-
Set up Secure Shell credentials for the transfer script or provide an alternative local transfer script as described in section User Customizable Transfer Script.
Note: It may be necessary to configure network firewalls as described in section ALTSTORE_AGENT_PORT_NUMBER.
- Configure the Alternate Retrieval Location parameters using the StorNext GUI, as demonstrated in section Alternate Retrieval Location Configuration, or by using the
fssaltnode
command as described in sectionfsaltnode
. - Enable the Alternate Store Location feature in Class Policies using the StorNext GUI, as demonstrated in section Storage Manager Policy Configuration, or with the
fsmodclass
command as described in sectionfsmodclass
.
There are no special steps required for upgrading StorNext when the Alternate Store Location feature is configured.
Operation of the Alternate Store Location feature requires that the main site, the remote site, and the network between them are up and running. Whenever all of these are brought on-line, file transfers will resume automatically after a brief pause, typically 20 seconds.
Operation can be suspended during periods as needed for system performance or other reasons by use of the fsschedlock
command as described in section fsschedlock
. When the lockout period ends, there may be a few minutes delay before file transfers resume. The delay is affected by the setting described in ALTSTORE_POLL_TIME
.
The backlog of pending file copies can be viewed as described in section altstoremod
.
The status of copy operations per file can be viewed as described in section altstoreadd
.
Files that existed before their policies had the Alternate Store Location feature enabled are not copied without taking additional steps. It is recommended that the set of these files be added in subsets of the total list, rather than all at once. This allows for optimizing the rate of copy completions by organizing the retrieval of files for a minimum amount of tape handling. When the Alternate Store Location feature encounters a file that is truncated, it will retrieve the file automatically with a small amount of retrieval optimization, but it can end up bottle-necking file copies behind inefficient retrievals when there is a large set of truncated files in the list of files to be copied.
Best practices for the copying of pre-existing files is to:
- Identify the set of all files you wish to copy that are not already enabled for Alternate Store Location copying.
- Organize the full set into subsets that can be retrieved efficiently by subset.
- Retrieve one complete subset of files to disk.
- After the subset retrieval has completed, submit the subset to the input of the
altstoreadd
command as described in sectionaltstoreadd
, and start the retrieval of another subset of files. - Monitor the completion of background copies as described in section
altstoremod
. When a subset has been copied, truncate the files listed in the subset.
The Alternate Retrieval Location feature works well with the Alternate Store Location feature. They share the configuration of the remote node name and remote paths per file system. When retrieving a truncated file that is missing all of its local copies, StorNext will automatically use the Alternate Retrieval Location feature to access the remote copy as a last resort. StorNext compares the expected and actual file sizes of the remote copy as a simple integrity check, and aborts the retrieval if they do not match. StorNext cannot guarantee that the remote copy has not been modified or is not current, but the remote copy should be current in most cases if the main and remote systems have been operating successfully up to the point of the disaster.
When both Alternate Retrieval Location and Alternate Store Location features are configured, a further integrity check is made for alternate retrieves. The retrieve will not proceed for a file when remote copying is enabled for the file and the remote copy has not been stored.
The Alternate Store Location feature is useful for efficiently creating remote copies when the files are relatively large and they are changing relatively infrequently. It is more efficient but not as complete as the rsync and other publicly available mirroring software. The following list includes some of the limitations of the Alternate Store Location feature.
- Zero-length files are not stored to the remote site. Neither are symbolic links nor hard links.
- File deletion at the main site does not delete the remote copy.
- File rename at the main site results in creation of a new file at the remote site while a file remains under the old pathname.
- Empty directories are not copied to the remote site.
- Directory deletes and directory renames are not reflected on the remote.
- Changes to file and directory modes and owners are not reflected on the remote.
- Changes to remote copies are not detected except when they result in errors on retrieval.
- The set of versions of a file at the remote site may differ from the set of versions at the main site because of the timing of making versions.
Within these limits, the Alternate Store Location feature is useful for providing some protection against physical disasters by locating the remote site an unlimited distance from the main site, by allowing read-only access at the remote site, and by potentially providing a source for low-latency restores if the remote site does not truncate its on-disk copies.
Log messages are written as described in section Log Files. These may provide clues for the diagnosis of operational problems.
The most common problems preventing copies from being made are in the transfer operation. The user-customizable transfer script, as described in section User Customizable Transfer Script, depends on the proper setup of authentication credentials and the ability to make a connection through firewalls etc. Communication between the fs_altstore
resident processes at the main and remote sites is also dependent on communication by TCP through firewalls, etc. as described in section ALTSTORE_AGENT_PORT_NUMBER
.
The image below displays the configuration that is shared between the Alternate Retrieval Location and the Alternate Store Location features. It is not possible to configure Alternate Store Location without Alternate Retrieval Location, but it is not necessary to enable the per-file-system options for retrievals.
Configuring the Remote Node, enabling local file systems, and specifying the remote mount-point path per file system are the first steps in configuring both features. The Remote Node - there can only be one - applies to all of the enabled file systems. This information can also be specified on the command line with the fsaltnode
command.
The image below displays the Alternate Store Location configuration that is per policy class. Selecting the Alternate Store Location check box enables the feature for future files under the policy. Pre-existing files must be enabled and added to the candidate list individually as described in altstoreadd
.
The following examples show typical uses of the altstoreadd
command. Additional information about the command is described in altstoreadd
. The first script recursively finds all the files in a sub-directory hierarchy and submits them as candidates for background copying.
Note: Full pathnames are required. Files that have already been copied will be noted with an error message.
cd /stornext/snfs1/policy1/subdir # example directory
find `pwd` -type f |
altstoreadd
The next script finds all the files in the current directory (non recursive) and asks altstoreadd
to provide a report on the status of remote copies for those files.
cd /stornext/snfs1/policy1/subdir
find `pwd` -maxdepth 1 -type f |
altstoreadd
-e
These two uses of the altstoreadd
command can be used in a script that submits pre-existing files (those that existed before the Alternate Store Location feature was configured), in an optimal way for retrievals and for the management of primary-disk space.
In the event that remote-copy files have been modified or corrupted in some way, new copies can be made in two steps. The following command turns off the Alternate Store Location feature for all the files under a directory, which causes StorNext to discard its information about the remote copies.
Following that, use the command fschfiat
to turn on the feature and cause new copies to be made for the files.