SR3567598 Journal Move - Failed to locate space on sg 0 |
SR Information: 3567598 GWDG / Max Planck
Problem Description: Journal Move - Failed to locate space on sg 0
Product / Software Version:
MDC: SNMS 4.7.0.1 SLES 10.3 |
Overview
Customer experiences Journal Waits and it was determined that the Journal-Size was insufficient. As the MetaData StripeGroup 0 - holding the Journal - had no free space left, a Journal Migration was considered within the same step ending up in cvupdatefs to fail.
Symptoms & Identifying the problem
## 1 ## Log Review:
cvupdatefs-07_09_2015-13_04_05
Stripe Group Name Stripe Status MetaData Journal
================= ============= ======== =======
sg0 No Change No Change Delete
sg1 No Change
..
sg9 No Change No Change Create
This will modify the file system "ARCHIV".
Are you sure you want to continue? [y/N]
Flushing journal entries... done
Freeing old journal space...
Allocating new journal space...
*Fatal*: Failed to locate space on sg 0
The Log doesn’t mention the Journal Resize and only reveals a Journal Move from SG0 -> SG9 failing to allocate Space on SG0.
cvadmin show
Stripe Group 0 [sg0] Status:Up,MetaData,Journal,Exclusive
Total Blocks:3199711 (97.65 GB) Reserved:0 (0.00 B) Free:50 (1.56 MB) (0%)
cvadmin reveals that there is no space left on SG0, however it should be possible to migrate the Journal to antother StripeGroup even thought the source SG is full.
echo "show journal" | cvfsdb ARCHIV
Journal Descriptor Block
journal_marker = 0x4a6f55724e243231 [JoUrN$21]
journal_root = 0x0 à SG Index where Journal Resides
journal_size = 0x2000 à Journal Size
journal_slop =
journal_base =
journal_end =
journal_blksize = 0x200 à Journal BlockSize
journal_sum = (okay)
journal_size * blksize = 0x400000 = 4194304 à that’s the Value we are suppose to find in the config file.
head ARCHIV.cfgx
<snfs:config configVersion="8" name="ARCHIV" fsBlockSize="32768" journalSize="67108864">
<snfs:global>
Hence it might be obvious on how to resolve the situation, it’s worth looking into cvupdatefs and its processing.
## 2 ## Troubleshooting:
Note: this is a very simplistic view focused on Journal Migration & Journal Resize within cvudpatefs
Routine Calls made in main() of cvupdatefs.c
1.) Parse Config and Apply Globals read from Config File
main() -> Configuration_init() --> InitParserGlobals(), Configuration_init0() --> call_parser() -> parse_config_file() --> apply_config() --> apply_globals(),apply_stripegroups()
JournalSize and FSBlockSize being read from the Configuarion File
Excerpt:
apply_globals()
if (cfgdata->cfg_globals.gl_FsBlockSize.ctx) {
FsBlockSize = (int) cfgdata->cfg_globals.gl_FsBlockSize.value;
if (cfgdata->cfg_globals.gl_JournalSize.ctx) {
JournalSize = (int) cfgdata->cfg_globals.gl_JournalSize.value;
As well as StripeGroup Information
Excerpt:
apply_stripegroups()
/* Setup each partition table entry with a parttab_t struct */
for (i=0; i< NumParts; i++) {
PartTable[i].parttab = (void *)&(ptab[i]);
2.) Here is where the magic happens. Let’s figure out what changes need to be made to the file system, display these changes and set Flags
main() --> display_confirm_changes()
We loop trough the on-disk partition table first
while(found_parts < d_numparts && part_index < MAX_PARTS){
..
sg_list_head = (sglist_t*)sg_list_add(this_sg, sg_list_head, &error);
}
We loop through in-core partition table
while(found_parts < d_numparts && part_index < MAX_PARTS){
..
sg_list_head = (sglist_t*)sg_list_add(this_sg, sg_list_head, &error);
}
and look for Journal SG if a MD SG has been found. If a SG has been configured for Journal, we will set the SG_IS_JOURNAL Flag
if(PartTable[i].flags & PARTGROUP_STATUS_METADATA) {
this_sg->flag |= SG_IS_METADATA; }
if(PartTable[i].flags & PARTGROUP_STATUS_JOURNAL) {
this_sg->flag |= SG_IS_JOURNAL; journal_found++; }
We now have our StripeGroup List “sg_list_head”incl. changes which are taken care off by sg_list_add() routine
sg_list_add() -> Add a node to the stripe group list along with a flag.
If the name already exists in the stripe group list, simply update the values on disk
Find the journal root according to the ICB, if found set ICB_JOURNAL_ROOT Flag
main() --> display_confirm_changes() --> find_icb_journal_root()
The pointer pj is a pointer to a Journal Descriptor.
find_icb_journal_root(sglist_t *sglist_ptr)
{
int journal_desc_size;
journal_desc_t* pj;
Note: We can get this same information from using cvfsdb “show journal” command.
/* we found the config for the journal entry mark it as such */
sglist_ptr->flag |= ICB_JOURNAL_ROOT;
We compare the on-disk information for the JournalSize with what’s in the config file, if it doesn’t match set the Flag SG_JOURNAL_RESIZE
if((ntohl(pj->journal_size) * jblksize) != (uint32_t)(JRNFILESIZE * FsBlockSize))
{
if(HaveValidJournal)
sglist_ptr->flag |= SG_JOURNAL_RESIZE;
..
}
3.) Check if we need to re-create the journal space, remove it first
main() --> remove_journal()
* this will look at the config change structure and detect a remove
* journal configuration. If the journal is to be moved this routine
* will update the metadata to reflect this
As we are going through the StripeGroup List, check each SG for the following condition to verify if we need to call free_journal_space()
if ((((my_list->flag & SG_JOURNAL_MASK) == ICB_JOURNAL_ROOT) && ((my_list->flag & SG_JOURNAL_MASK) != SG_IS_JOURNAL)) ||
(my_list->flag & SG_JOURNAL_RESIZE))
Condition: Stripegroup has JournalRoot but StripeGroup has Journal=”false” in config file OR it’s a Journal Resize
CaseSpecific: sg0 has ICB_JOURNAL_ROOT and config file for sg0 has Journal=”false”. We also got SG_JOURNAL_RESIZE set due to find_icb_journal_root() . So both Conditions are true and we call free_journal_space()
main() --> remove_journal() --> free_journal_space() --> bm_free_space()
* give back the allocation for the existing journal
4.) Create Journal if being Resized or Migrated
main() --> create_journal()
* this will look at the config change structure and detect a removed
* journal configuration. If the journal is to be re-created this routine
* will update the metadata to reflect this.
As we are going through the StripeGroup List, check each SG for the following condition to verify if we need to call alloc_journal_space()
if((((my_list->flag & SG_JOURNAL_MASK) == SG_IS_JOURNAL) && ((my_list->flag & SG_JOURNAL_MASK) != ICB_JOURNAL_ROOT)) ||
(my_list->flag & SG_JOURNAL_RESIZE) ||
(my_list->flag & SG_JOURNAL_REBUILD)) {
/* get the location of the current journal */
mylog(LOG_INFO, "Allocating new journal space...\n");
if (alloc_journal_space(my_list->ordinal) != 0)
Condition: If the config File reflects Journal=”true” for the given StripeGroup which must notbe the journal root ( basically a Journal migration )
Or if Journal is flagged Resize / Rebuild
CaseSpecific: sg0 doesn’t meet the first condition since its JournalRoot. However the SG_JOURNAL_RESIZE Flag is being set and so it calls alloc_journal_space() when it must not! This is causing cvupdatefs to fail, since we don’t have any free space left on sg0
Resolutions/workarounds/fixes:
Migrating the Journal first and then resize the journal
cvupdatefs-07_09_2015-13_05_59
The following changes have been detected in the configuration
Please review these changes carefully.
Stripe Group Name Stripe Status MetaData Journal
================= ============= ======== =======
sg0 No Change No Change Delete
sg1 No Change
..
sg9 No Change No Change Create
This will modify the file system "ARCHIV".
Are you sure you want to continue? [y/N]
Flushing journal entries... done
Freeing old journal space...
Allocating new journal space...
Flushing buffers...
Updating ICB information...
Updating SuperBlock information...
*Warning*: File system 'ARCHIV' was modified.
cvupdatefs-07_09_2015-13_06_58
The following changes have been detected in the configuration
Please review these changes carefully.
Stripe Group Name Stripe Status MetaData Journal
================= ============= ======== =======
sg0 No Change No Change
sg1 No Change
..
sg9 No Change No Change Resize
This will modify the file system "ARCHIV".
Are you sure you want to continue? [y/N]
Flushing journal entries... done
Freeing old journal space...
Allocating new journal space...
Flushing buffers...
Updating ICB information...
Updating SuperBlock information...
*Warning*: File system 'ARCHIV' was modified.
echo "show journal" | cvfsdb ARCHIV
Journal Descriptor Block
journal_marker = 0x4a6f55724e243231 [JoUrN$21]
journal_root = 0x9 ß Migrated to SG9
journal_size = 0x20000 ßJournalSize increased
journal_blksize = 0x200
journal_sum = 0x64460c55 (okay)
Last Step would be to file a Bug Report as follows
Bug 57575 - cvupdatefs - create_journal() routine needs better handling of the SG_JOURNAL_RESIZE if-clause
What we learn from this case:
- Change One Thing at a Time
- Simplistic view on cvupdatefs method of work / code
- Journal Resizing or Migration is technically a Journal-Remove followed by Journal-Create
- If You Don’t Fix It, It Ain’t Fixed-> file a bug report or escalate to Sustaining
- cvupdatefs has a debug flag, use it!
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |