2 - Analyze Garbage Collection (DRAFT)

Overview

The 7500 platform is not very resilient once a customer starts pushing it past the intended physical and performance capacity. This playbook will help service efficiently identify the root cause for performance issues.

What is reclamation/garbage collection?

Reclamation and Garbage collections are the terms used to describe deletion of data on a DXi that is no longer referenced. There are four stages to reclamation:

Stage 1 - Compaction:

Compaction is the term used to describe the stage where data is actually deleted. This stage is in place to delete data that was identified for deletion during a previous reclamation operation. This allows customers to cancel reclamation and recover data without running through the entire operation cycle.

State 2 – Reconciliation:

During this stage two lists are created then compared. A list of tags that are still referenced and a list of tags that are still in the blockpool. The reconciliation thread does the following:

Uses the bfst_query api commands to get a list of all tags in the blockpool and dump them to /data/hurricane/bpgc/blockpooltags
Syncs with dedupd to see if there are any files in the middle of a dedup operation
Dumps tags referenced in the file system and replication namespaces to /data/hurricane/bpgc/referencedtags
- Tags for replication jobs in progress
- Tags preposted for namespace/continuous/trigger replication
- Tags in target side namespace bundles
- Tags in source side namespace bundles
- Tags in filesystem
Compares the two files created previously, blockpooltags and referencedtags. Tags that are in blockpooltags but not in referencedtags are now deletion candidates and are used to create the candidate list in /data/hurricane/bpgc/deleteCandidates.
The deleteCandidates file is then broken up into 4 separate files so that bpgc can use 4 deletion threads, one for each file.

Stage 3 – Deletion

There are 4 Deletion Threads (controlled by seer variable MaxStreams)
Each worker threads copies it’s candidate file to a working file. E.g deleteCandidates_Stream0.inprogress
When a worker thread makes a single pass through it’s deletion candidate file, it pauses
The bfst_delete_count API for each tag in the candidate list providing the number of times the tag is referenced at the time the candidate list was generated (ensures the tag is not deleted if it was re-stored)
The reference count is then decremented for each tag and all of its unique clusters. This is where the bottleneck is most likely to exists. Say a customer has 100k tags to be deleted with 128 unique clusters in each tag. The reference count would have to be read for the step above and written as the decrement operation. This would be 12,800 small i/o operations for read and another 12,800 i/o operations for writes. Throw other operations such as replication, truncation and ingest into the mix and performance for everything on the system is decreased significantly.

Stage 4 – Compaction

See Stage 1

Analysis

For this playbook we will use the Harris IT system that has been used in the other Dxi7500 Playbooks. GC tells the story and is actually a good starting point when looking at a performance problem on a DXi7500. Using the timeline created earlier in the DXi7500 series we will be looking at the GC Details menu option.

http://dart.quantum.com/cgi-bin/stats?DB=CX0929BVA00446-20110228-123022&pr=18&start=1293048543&end=1298827821&base=1000

First we’ll look at the Space Reclamation graph compared with the Data Ingest Volume per day Graph. From this graph we can learn the following:

After week 52, Stage 3 never completes and Stage 4 is never run. This means someone is restarting GC in Stage 3 to let Stage 1 get some actual space back.
The above bullet point would imply a capacity problem
GC has pretty much been running constantly for the past 5 weeks
Week 05 shows a very long time period for stage 2. The Data Ingest Volume per day shows a 16 TB ingest right before the large stage 2 period and a 20 TB ingest in Week 05.
The activity from those two large daily ingest periods resulted in a stage 2 that took over a week until it was canceled and restarted right before week 6
Looking at these graphs without logs may indicate that GC was hung in stage 2 during that entire week.

Next we will look at the Blockpool Tags graph. From this graph we learn the following:

The system has a total of 2.397 million tags
1.337 of the 2.397 million are referenced
1.060 million tags are ready to be deleted (Delete Candidates)
The number of Delete Candidates grew from 384.323 k to 1.060 M between the previous 10 weeks
The long/hung stage 2 between Weeks 4 and 5 seems to result in DART not being able to report tag information properly

Last we will use susrepo to determine how many tags can be deleted in a daily timeframe. If we look at the above graph for a 24 hour period from 00:00:01 12/18/2010 to 00:00:01 12/19/2010 we learn the following:

Only 144 tags were stored during this time period. Looking at other graphs we can determine that the system was relatively quiet during this 24 hours.
74,551 tags were deleted on 12/18/2010 with GC running all day
If the system could sustain 74,551 tags per day it would take 13.4 days to delete the 1 million tags that are waiting to be deleted.

However, if we look at a different 24 hour period 12/30/2010 0 12/31/2010 while ingest is running with a small amount of retrieves, we can see that only 38,037 tags were deleted.
At that rate we’re looking at 26 days for all data to be deleted
If we use a number somewhere in the middle of the two, say 50k per day, we’re looking at 20 days

Conclusion

This system currently has 1.060 million tags that need to be deleted and the number keeps rising because the ingest for backups and reads for tape outs that are happening. This system requires a significant decrease in ingest or multiple DXi systems in order to meet the demands placed upon it by the customer.

Next: Ingest and Reads

Previous: Timeline