Overview
In this scenario, the customer has called with this problem: Reclamation is not finding any files to delete on the target and it is filling up.
Problem Description: The customer deleted partitions on both the source and target systems that were configured for trigger based replication. After deleting several partitions on the target as well as their snapshots a GC still did not find any data to reclaim. The tags to be reclaimed were still being referenced somewhere. A sync ended up releasing those tags but where were they referenced?
Available Information: The Information at Status -> System -> Disk Usage shows that there is deduplicated data still in the blockpool. The Replication taglist (JM: Be more specific. Probably gc or hc referenced tags) shows that the target is aware of the state of the tags on the target.
Tech Support’s Initial Thoughts: Tech Support walks the customer through running a sync to see what happens. After running the sync, the target was able to find the files to be deleted. WHY IS THIS? Knowing the process flow for trigger-based replication would help to understand all of the processes and files that may be referenced.
Questions from Tech Support:
- Why isn’t reclamation finding any files to delete?
- What would have been the most appropriate way to help the customer?
- What is the sequence of what happens and why?
------------------------------------------------------------------------------------------------------
Next Steps: Responses from Engineering
- Continuous replication creates a file named ‘continuous’ in the /snfs/replication/target/<fully qualified source host>/partitions/<vtl name> directory for each share or vtl.
- This file protects all tags replicated via continuous replication.
- This file is only removed when a Successful or Partial On Demand or Scheduled Namespace or Sync is successfully posted to the target system..
- It is most likely that a successful Namespace Replication has not been run for the partitions and a ‘continuous’ file is present preserving tags that are no longer pertinent to the vtl. You need to run a Namespace Replication for the partition and start Space Reclamation.
- It is possible that there is previously replicated data that is referencing the tags, even though the latest replication does not refer to these tags (up to 10 sets of replicated data can be maintained for each partition).
- Remove previously replicated sets of data using the GUI (be sure to remove those with older dates).
- The maximum number of replicated snapshots can be adjusted in the GUI from the TargetRole General page in the Replicated Snapshots area.
- The amount of new unique data a set will contain is dependant upon how frequent a successful Namespace Replication / Sync is performed. If allowed to run daily, this will be 10 days by default. If instead the customer runs Sync weekly, the delta unique data would then contain 10 WEEKS resulting in a *much* larger footprint on the target!!
- In the /hurricane/tsunami.log’ file, go to the bottom of the file and search from the bottom to top for the first occurrence of ‘bpgc ‘. Follow that upwards to determine if space reclamation was interrupted or resulted in an error.
- JM: Consider detailing the case where all partitions and shares have been removed and the system has not yet reclaimed all of the space though GC has run multiple times and no longer is freeing up disk space. In this case, search for continuous files (as listed in the first bullet) and see if any are large. If so, escalate to SES as this flie is probably the reason why replication is still holding on to the information. SES should determine if this is or is not viable and take the appropriate steps.