SR3567598 Journal Move - Failed to locate space on sg 0

SR Information: 3567598 GWDG / Max Planck

 

Problem Description: Journal Move - Failed to locate space on sg 0

 

Product / Software Version:

 

MDC:

SNMS 4.7.0.1

SLES 10.3

 

 Overview

Customer experiences Journal Waits and it was determined that the Journal-Size was insufficient. As the MetaData StripeGroup 0 - holding the Journal - had no free space left, a Journal Migration was considered within the same step ending up in cvupdatefs to fail.

 

Symptoms & Identifying the problem

 

## 1 ## Log Review:

 

cvupdatefs-07_09_2015-13_04_05

 

Stripe Group Name  Stripe Status  MetaData   Journal

=================  =============  ========   =======

sg0                No Change      No Change  Delete

sg1                No Change

..

sg9                No Change      No Change  Create

 

This will modify the file system "ARCHIV".

Are you sure you want to continue? [y/N]

Flushing journal entries...  done

Freeing old journal space...

Allocating new journal space...

*Fatal*: Failed to locate space on sg 0

 

 

The Log doesn’t mention the Journal Resize and only reveals a Journal Move from SG0 -> SG9 failing to allocate Space on SG0.

 

cvadmin show

Stripe Group 0 [sg0Status:Up,MetaData,Journal,Exclusive

Total Blocks:3199711 (97.65 GB)  Reserved:0 (0.00 B) Free:50 (1.56 MB) (0%)

 

cvadmin reveals that there is no space left on SG0, however it should be possible to migrate the Journal to antother StripeGroup even thought the source SG is full.

 

echo "show journal" | cvfsdb ARCHIV

 

Journal Descriptor Block

     journal_marker  = 0x4a6f55724e243231 [JoUrN$21]

     journal_root    = 0x0 à SG Index where Journal Resides

     journal_size    = 0x2000 à Journal Size

     journal_slop    =

     journal_base    =

     journal_end     =

     journal_blksize = 0x200 à Journal BlockSize

     journal_sum     = (okay)

 

journal_size * blksize = 0x400000 =  4194304 à that’s the Value we are suppose to find in the config file.

 

head ARCHIV.cfgx

  <snfs:config configVersion="8" name="ARCHIV" fsBlockSize="32768" journalSize="67108864">

    <snfs:global>

 

Hence it might be obvious on how to resolve the situation, it’s worth looking into cvupdatefs and its processing.

 

 

## 2 ## Troubleshooting:

 

Note: this is a very simplistic view focused on Journal Migration & Journal Resize within cvudpatefs

 

Routine Calls made in main() of cvupdatefs.c

 

1.) Parse Config and Apply Globals read from Config File

 

main() -> Configuration_init() --> InitParserGlobals(), Configuration_init0()  --> call_parser() -> parse_config_file() --> apply_config() --> apply_globals(),apply_stripegroups()

 

JournalSize and FSBlockSize being read from the Configuarion File

 

Excerpt:

apply_globals()

if (cfgdata->cfg_globals.gl_FsBlockSize.ctx) {

              FsBlockSize = (int) cfgdata->cfg_globals.gl_FsBlockSize.value;

                           

if (cfgdata->cfg_globals.gl_JournalSize.ctx) {

              JournalSize = (int) cfgdata->cfg_globals.gl_JournalSize.value;

 

As well as StripeGroup Information

 

Excerpt:

apply_stripegroups()

/* Setup each partition table entry with a parttab_t struct */

        for (i=0; i< NumParts; i++) {

                  PartTable[i].parttab = (void *)&(ptab[i]);

 

 

             

2.) Here is where the magic happens. Lets figure out what changes need to be made to the file system, display these changes and set Flags

 

main() --> display_confirm_changes()

 

We loop trough the on-disk partition table first

while(found_parts < d_numparts && part_index < MAX_PARTS){

..

sg_list_head = (sglist_t*)sg_list_add(this_sg, sg_list_head, &error);

}

 

We loop through in-core partition table

  while(found_parts < d_numparts && part_index < MAX_PARTS){

..

sg_list_head = (sglist_t*)sg_list_add(this_sg, sg_list_head, &error);

}

 

and look for Journal SG if a MD SG has been found. If a SG has been configured for  Journal, we will set the SG_IS_JOURNAL Flag

              if(PartTable[i].flags & PARTGROUP_STATUS_METADATA) {

                            this_sg->flag |= SG_IS_METADATA;              }

        if(PartTable[i].flags & PARTGROUP_STATUS_JOURNAL) {

                this_sg->flag |= SG_IS_JOURNAL;              journal_found++;  }

 

We now have our StripeGroup List “sg_list_head”incl. changes which are taken care off by sg_list_add() routine

sg_list_add() -> Add a node to the stripe group list along with a flag.

If the name already exists in the stripe group list, simply update the values on disk

 

Find the journal root according to the ICB, if found set ICB_JOURNAL_ROOT Flag

main() --> display_confirm_changes() --> find_icb_journal_root()

 

The pointer pj is a pointer to a Journal Descriptor.

find_icb_journal_root(sglist_t *sglist_ptr)

{

    int                 journal_desc_size;

    journal_desc_t*     pj;

Note: We can get this same information from using cvfsdb “show journal” command.

 

/*   we found the config for the journal entry  mark it as such */

sglist_ptr->flag |= ICB_JOURNAL_ROOT;

 

We compare the on-disk information for the JournalSize with whats in the config file, if it doesn’t match set the Flag SG_JOURNAL_RESIZE

if((ntohl(pj->journal_size) * jblksize) != (uint32_t)(JRNFILESIZE * FsBlockSize))

{

                                          if(HaveValidJournal)

                                              sglist_ptr->flag |= SG_JOURNAL_RESIZE;

..

}

 

 

 

3.) Check if we need to re-create the journal space, remove it first

 

main() --> remove_journal()

 

*  this will look at the config change structure and detect a remove

*  journal configuration. If the journal is to be moved this routine

*  will update the metadata to reflect this

 

As we are going through the StripeGroup List, check each SG for the following condition to verify if we need to call free_journal_space()

 

if ((((my_list->flag & SG_JOURNAL_MASK) == ICB_JOURNAL_ROOT)  && ((my_list->flag & SG_JOURNAL_MASK) != SG_IS_JOURNAL)) ||

              (my_list->flag & SG_JOURNAL_RESIZE))

 

Condition: Stripegroup has JournalRoot but StripeGroup has Journal=”false” in config file  OR it’s a Journal Resize

 

CaseSpecific: sg0 has ICB_JOURNAL_ROOT and config file for sg0 has Journal=”false”. We also got SG_JOURNAL_RESIZE set due to find_icb_journal_root() . So both Conditions are true and we call free_journal_space()

 

main() --> remove_journal() --> free_journal_space() --> bm_free_space()

* give back the allocation for the existing journal

 

 

4.) Create Journal if being Resized or Migrated

 

main() --> create_journal()

 

*  this will look at the config change structure and detect a removed

*  journal configuration. If the journal is to be re-created this routine

*  will update the metadata to reflect this.

 

As we are going through the StripeGroup List, check each SG for the following condition to verify if we need to call alloc_journal_space()

 

if((((my_list->flag & SG_JOURNAL_MASK) == SG_IS_JOURNAL) && ((my_list->flag & SG_JOURNAL_MASK) != ICB_JOURNAL_ROOT)) ||

                                          (my_list->flag & SG_JOURNAL_RESIZE) ||

                                          (my_list->flag & SG_JOURNAL_REBUILD)) {

                              /* get the location of the current journal */

                              mylog(LOG_INFO, "Allocating new journal space...\n");

                              if (alloc_journal_space(my_list->ordinal) != 0)

 

Condition: If the config File reflects Journal=”true” for the given StripeGroup which must notbe the journal root ( basically a Journal migration )

Or if Journal is flagged Resize / Rebuild              

 

CaseSpecific: sg0 doesn’t  meet the first condition since its JournalRoot. However the SG_JOURNAL_RESIZE Flag is being set and so it calls alloc_journal_space() when it must not! This is causing cvupdatefs to fail, since we don’t have any free space left on sg0             

 

 

 

 

Resolutions/workarounds/fixes:

 

 

Migrating the Journal first and then resize the journal

 

cvupdatefs-07_09_2015-13_05_59

 

The following changes have been detected in the configuration

Please review these changes carefully.

 

Stripe Group Name  Stripe Status  MetaData   Journal

=================  =============  ========   =======

sg0                No Change      No Change  Delete

sg1                No Change

..

sg9                No Change      No Change  Create

 

This will modify the file system "ARCHIV".

Are you sure you want to continue? [y/N]

Flushing journal entries...  done

Freeing old journal space...

Allocating new journal space...

Flushing buffers...

Updating ICB information...

Updating SuperBlock information...

*Warning*: File system 'ARCHIV' was modified.

 

 

 

cvupdatefs-07_09_2015-13_06_58

 

The following changes have been detected in the configuration

Please review these changes carefully.

 

Stripe Group Name  Stripe Status  MetaData   Journal

=================  =============  ========   =======

sg0                No Change      No Change

sg1                No Change

..

sg9                No Change      No Change  Resize

 

This will modify the file system "ARCHIV".

Are you sure you want to continue? [y/N]

Flushing journal entries...  done

Freeing old journal space...

Allocating new journal space...

Flushing buffers...

Updating ICB information...

Updating SuperBlock information...

*Warning*: File system 'ARCHIV' was modified.

 

echo "show journal" | cvfsdb ARCHIV

 

Journal Descriptor Block

     journal_marker  = 0x4a6f55724e243231 [JoUrN$21]

     journal_root    = 0x9 ß Migrated to SG9

     journal_size    = 0x20000 ßJournalSize increased

     journal_blksize = 0x200

     journal_sum     = 0x64460c55 (okay)

 

 

Last Step would be to file a Bug Report as follows

Bug 57575 - cvupdatefs - create_journal() routine needs better handling of the SG_JOURNAL_RESIZE if-clause

 

 

 

What we learn from this case:

- Change One Thing at a Time

- Simplistic view on cvupdatefs method of work / code

- Journal Resizing or Migration is technically a Journal-Remove followed by Journal-Create

- If You Don’t Fix It, It Ain’t Fixed-> file a bug report or escalate to Sustaining

- cvupdatefs has a debug flag, use it!

 



This page was generated by the BrainKeeper Enterprise Wiki, © 2018