Blockpool: Corrupt Clusters But No Corrupt Blobs

Overview

This topic describes what to do when a DXi system has corrupt clusters but no corrupt blobs.


 


Problem

The verifies have  run, including the cluster fast and blob data fast verifies. Bad clusters were detected but no bad blobs.

 

Aug  1 09:59:23 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86179 contains corrupt data.
Aug  1 09:59:23 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86178 contains corrupt data.
Aug  1 09:59:23 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86177 contains corrupt data.
Aug  1 09:59:29 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86180 contains corrupt data.
Aug  1 09:59:33 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86181 contains corrupt data.
Aug  1 10:35:50 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86182 contains corrupt data.
Aug  1 10:35:56 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86183 contains corrupt data.
Aug  1 11:03:34 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 86213 contains corrupt data.
Aug  1 11:23:29 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 85502 contains corrupt data.
Aug  1 11:29:41 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 85503 contains corrupt data.
Aug  1 11:35:45 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 85504 contains corrupt data.
Aug  1 11:38:35 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 85714 contains corrupt data.
Aug  1 12:39:17 PXFCI-BACKUP03 Blockpool[22549]: W: [1580] (Verify2) Cluster 85501 contains corrupt data.


 

 


Solution

The following provides one example of how to find out what is wrong with the clusters. 

 

  1. Create the blob list and have it run in the background. It will take a while. This will identify each blob that has a pointer to this cluster. You can track it through messages.

nohup /hurricane/blockpool/bin/blockpool dump blob-list +C3380147 +O3380147_related_blobs.txt +Nlocal@localhost 2>&1 &

 

  1. While the above is running, perform a cluster header dump:

/hurricane/blockpool/bin/blockpool dump cluster 3380147 +Nlocal@localhost 2>&1 | cat > cluster_3380147.dump

 

Based on the inforamtion below, I would suspect the cluster header to be bad. This can happen if inflight data is interrupted. Look for "Header is corrupt" messages.
 
   Dump output subset.
 
 H:          1   b3c9f   2e34       4   2e30    2e30 CO,IC                   B9C44653FD6E098E786F164CB49A2F7D
 B:              b3c9f   2e34       4   2e30                                 65585DF4FE65CB5F9C335F13A86EDD2B
 W: Blocklet is marked as corrupt.
 
  B: Compressed run @ b6ad3
 W: Error decompressing run header.
     
Header is corrupt or of an unrecognized version.

 

 Reference Count

               |
              V
 H:          1   b6ad3   55c4       4   594d    594d CO,CM,SR,SI     C4B4434F1DACC07B3284DE5BC4A75855
 
 H:          1   bc097   2094       4   2090    2090 CO,IC                   D2CDF86A4D4572E54FE26587DF7399FD
 B:                bc097   2094       4   2090                                            7387D4818514DE7F8D12F27812628380
 W: Blocklet is marked as corrupt.

 

  1. Format the blob list from the dump blob-list command so it can be used with the +B option.

So before the change it would look like this:

 

B9C44653FD6E098E786F164CB49A2F7D
65585DF4FE65CB5F9C335F13A86EDD2B

 

After the change it would look like this:

 

B9C44653FD6E098E786F164CB49A2F7D "B9C44653FD6E098E786F164CB49A2F7D"
65585DF4FE65CB5F9C335F13A86EDD2B "65585DF4FE65CB5F9C335F13A86EDD2B
"

 

I use a for loop and do    echo $i "$i"

 

Output the entire for loop to a file.

 

  1. Verify the blocklets in the list that was just modified.  If the list is long you may want to run it in the background.

/hurricane/blockpool/bin/blockpool verify +BFormattedBlobList +Nlocal@localhost

 

  1. For each bad blocklet found, get the blob from the replication partner. Once the corrupt blobs are replicated or removed, it will rewrite the cluster header/body and fix the cluster.

     

  2. If the cluster header is bad from number 2 and the blob list from number 1 is empty, a reference verify needs to be performed to rewrite the cluster header.

     

  3. If all of the tags that are marked as corrupt have a reference value of 0, and the non-zero referenced tags are not marked as corrupt then perform a compaction. 

 



This page was generated by the BrainKeeper Enterprise Wiki, © 2018