SR3523206, Customer:ZUFFA, LLC dba ULTIMATE FIGHTING CHAMPIONSHIP (UFC) -- M662XL/policy manager will not launch

In working this case  which was escalated to SES/SUS, there were 2 problems identified::

 

1.

 Initial problem was that customer tries to start Storage Manager and it would not start. It would die right away with a core file message:

Mar 30 09:01:01.681617 2015 12 trimcores 12 UNKNOWN UNKNOWN 18428729 51 New core file:-rw------- 1 root root 231059456 Mar 30 08:59 /scratch/core/fs_fcopyman-1427731199-3718  Ticket creation time: 03/30 09:01:01 PDT
And also:

Mar 30 09:04:17.665634 2015 12 TSM 12 TSM UNKNOWN 18428977 1 TSM has terminated ABNORMALLY
Mar 30 09:05:34.526159 2015 12 TSM 12 TSM UNKNOWN 18429204 1 TSM has terminated ABNORMALLY

 

SES/SUS dug further and found out that since customer was ingesting a lot of big (>100G) video files, TSM was very

busy and it was creating temp lots of temp files in /usr/adic/HAM/shared/TSM/internal/recovery_dir dir. They cleaned up this directory

and TSM then started up fine.

 

SUS is still looking into why this dir was filling up.

 

2. Then we found out that customer's connection from TSM to Lattus was not working:

TSM tac log showed:

Note specifically: "HTTP 403 Forbidden".

 

Mar 30 08:58:27.416164 MDC2 sntsm fs_fmoverc[44836]: E1200(7)<00000>:mdtWAStorage199: Stop deleting objects for mediaNdx 1887 due to failure rc=409 Mar 30 08:58:27.416317 MDC2 sntsm fs_fmoverc[44836]: E1202(9)<00000>:mdm1utl1112: End Object Storage recovery cleanup (file: WRF_18356027_25438_0) Mar 30 08:58:27.416348 MDC2 sntsm fs_fmoverc[44836]: E1202(9)<00000>:mdm1utl1100: Starting Object Storage recovery cleanup (file: WRF_18271160_16745_0) Mar 30 08:58:27.423901 MDC2 sntsm fs_fmoverc[44836]: E1200(7)<00000>:mdtWAStorage1189: deleteObject: failure rc=409, url=http://192.168.252.5:7070/stornexts3/sm3298AEE8098E04000000000009340A3B0000000200002ZL2X2LGF8MWRHVFRK, error: HTTP status 403: 403 Forbidden^M
 
for url: http://192.168.252.5:7070/stornexts3/sm3298AEE8098E04000000000009340A3B0000000200002ZL2X2LGF8MWRHVFRK
Mar 30 08:58:27.423918 MDC2 sntsm fs_fmoverc[44836]: E1200(7)<00000>:mdtWAStorage199: Stop deleting objects for mediaNdx 1887 due to failure rc=409 Mar 30 08:58:27.424063 MDC2 sntsm fs_fmoverc[44836]: E1202(9)<00000>:mdm1utl1112: End Object Storage recovery cleanup (file: WRF_18271160_16745_0) Mar 30 08:58:27.424089 MDC2 sntsm fs_fmoverc[44836]: E1202(9)<00000>:mdm1utl1100: Starting Object Storage recovery cleanup (file: WRF_18253206_40828_0) Mar 30 08:58:27.428275 MDC2 sntsm fs_monitor[44725]: E1201(8)<18428304>:msa1bgn1493: The select system call failed with errno: 4 Mar 30 08:58:27.428314 MDC2 sntsm fs_monitor[44725]: E1003(3)<18428304>:msa1bgn2225: Process fs_fcopyman (pid 44733) exited with a bad status of 99.

 

 

Every time system would start to retreive a file or store a file, we would get this "HTTP 403 Forbidden" message".

The customer's connection from TSM to Lattus is over a Amazon AWS S3 connection.

 

We got Amplidata engaged and they stated that AWS has a requirement that the system time not be off by more than 15 mins b/w client and Lattus.

We changed the time on both nodes manually (customer did not want to reboot) and sure enough files retrieves started working again.

 

SUS has a bug opened for this:

 

- Bug 48339 - S3: Object Storage Destination creation failure: Lattus returning a rc=409, error: HTTP status 403: 403 Forbidden error.


This page was generated by the BrainKeeper Enterprise Wiki, © 2018