StorNext Database 4.x to 5.x Conversion Monitoring

Attach RAID (9-pin RS232 -> 4mm jack) cable to COM1 on the RAID controller and the COM port on the back of the laptop

During the upgrade to StorNext 5 the internal MySQL database has new columns added to some of the managed filesystem tables. On large filesystems this can take some time to complete during which time the managed filesystems are available but no files can be stored or retrieved. The new tables are also subsequently indexed adding more time to the upgrade. This article is intended to provide additional information to assist in planning for and monitoring the database conversion process.

Whilst the Release Notes advise it will take “approximately one hour for every hundred million entries in the filecomp tables” this figure is highly dependant on the system being upgraded. (The figure quoted was for an M660 appliance). Low powered systems with reduced memory may take considerably longer. One Software Only Customer with 760 million entries spread across 6 filecomp tables took over 64 hours. (The system was a Sun system with 4 sockets, quad core AMD 800MHZ system with 64GB RAM and the database memory parameter innodb_buffer_pool_size set to 8GB).

Whilst it is extremely difficult to predict the time it will take and given no progress is written to any logs we can determine from the database process table and journal what is going on.

Note that increasing the innodb_buffer_pool_size for the duration of the database upgrade can reduce the processing time.

Determining FILECOMP Sizes

As the majority of the time during the update is spent operating on the FILECOMP tables in the database it’s useful to determine how many such tables are in the database and how many entries are in each. In the following example there are 4 managed filesystems and therefore 4 filecomp tables.

The devdb_v output shows which filesystems their index numbers correspond to :

# mysql -e "select Device_key, Path from devdb_v;" tmdb

+------------+----------------+

| Device_key | Path |

+------------+----------------+

| 1 | /stornext/test |

| 2 | /stornext/tape |

| 3 | /stornext/prod |

| 4 | /stornext/new |

+------------+----------------+

And the following filecomp tables :

# mysql -e "show tables;" tmdb | egrep "filecomp.*_v"

filecomp1_v

filecomp2_v

filecomp3_v

filecomp4_v

Now we can get the number of entries in each filecomp table :

# mysql -ss -e "select count(*) from filecomp1_v;" tmdb

544

# mysql -ss -e "select count(*) from filecomp2_v;" tmdb

# mysql -ss -e "select count(*) from filecomp3_v;" tmdb

206838598

# mysql -ss -e "select count(*) from filecomp4_v;" tmdb

In the above example it looks like the updates for filecomp3 for /stornext/prod will take the longest.

Running Process

During an upgrade the perl script “/usr/adic/TSM/util/install/dbupgrade” is invoked to apply any schema updates required between versions.

The following command will show the list of database actions currently in progress. This can be used at any time to figure out what the database is doing during normal StorNext operation but is particularly useful during the upgrade :

mysql --table -e "select * from information_schema.processlist;" mysql

This will show a table of processes and the databases being operated on.

Past Operations

The database journal will show the dates and times of the COMPLETED database operations. From this you should be able to determine what tables have been completed and how long it took to update them. The journal is a binary file and needs to be read using the MySQL utility mysqlbinlog. First determine the current MySQL journal file and convert it to a text file :

# ls -lL /usr/adic/mysql/journal

total 408652

-rw-rw---- 1 quantumdb adic 149228 Jul 20 08:25 qtm_binlog.000016

-rw-rw---- 1 quantumdb adic 418298649 Sep 28 15:29 qtm_binlog.000017

-rw-rw---- 1 quantumdb adic 84 Jul 20 08:27 qtm_binlog.index

# mysqlbinlog /usr/adic/mysql/journal/qtm_binlog.000017 > /tmp/qtm_binlog.000017.txt

The resulting file contains a vast amount of data with most of the operations completing within a second. For the upgrade we can search out the filecomp operations that we’re interested in :

# egrep "^#|alter table|create index" /tmp/qtm_binlog.000017.txt | egrep -v "^# at " | egrep -B1 -A1 filecomp

This will output the commands that we are expecting to take a significant amount of time flanked by the date and time stamps. EG (specifically for filecomp3) :

#150916 20:07:25 server id 1 end_log_pos 23564221 CRC32 0x89c9f711 Query thread_id=70 exec_time=119438 error_code=0

alter table filecomp3 add column Objid varchar(4096) default ''

#150918 5:18:56 server id 1 end_log_pos 23564302 CRC32 0xb55e8426 Query thread_id=25 exec_time=0 error_code=0

#150918 5:18:03 server id 1 end_log_pos 25608980 CRC32 0x12d54c6d Query thread_id=70 exec_time=10275 error_code=0

create index objid on filecomp3 (objid)

#150918 8:09:51 server id 1 end_log_pos 25609061 CRC32 0xcf907d16 Query thread_id=25 exec_time=0 error_code=0

#150918 8:09:18 server id 1 end_log_pos 27442313 CRC32 0x9b2a8a88 Query thread_id=70 exec_time=9253 error_code=0

create index adddate on filecomp3 (adddate)

#150918 10:43:32 server id 1 end_log_pos 27442450 CRC32 0x1f9876b2 Query thread_id=70 exec_time=1 error_code=0

#150919 12:17:23 server id 1 end_log_pos 45660440 CRC32 0x8a225c07 Query thread_id=70 exec_time=0 error_code=0

alter table filecomp3 change column Mpart Entype int(11)

#150919 12:17:23 server id 1 end_log_pos 45660576 CRC32 0x31e986c6 Query thread_id=964 exec_time=0 error_code=0

Note that not only does this give the start and end timestamps but the “exec time=” also notes the number of seconds that the command took. This can be particularly useful when estimating the time it will take for subsequent tables to be processed. Note however that the number of entries/second for each table varies depending on the content and available memory. (IE some filecomp tables within the same database will process faster than others).

Sadly there’s no way to show the progress of the commands as they are running.