Opdup in a Giant Turtle Shell

Overview:

 

Included in this article is an in-depth look at how opdup works as of FW 2.2.1.  Knowing a little history will help in understanding why the current design is being used.

 

When opdup was introduced the implementation of segmentation on the DXi products specifically defined 3 different IP addresses.  Management, Replication and Data.  Our process also listened on one of these specific IP addresses which is how we steered the data.  For example, ostd only listened on the data IP address.  http only listened on the management IP address and replicationd only listened on the repliciation IP address.

 

Our current implementation of mininet is much different.  ostd, for example, now listens on ALL IP addresses and when a user allows the traffic type of Data on an IP address the system automatically creates the iptables rules required for network access.

We no longer have a single defined management IP, replication IP or data IP.  Instead we have IP address that allow management, replication or data traffic types.  We now find customers with several IP addresses that have replication enabled.

Understanding this transition from our younger network implementation to our current mininet will help understand why this current solution is so complicated.

 

Duplication:

A duplication job is an operation that is created and run in NBU.  Within a duplication job you have a source and then one or many destinations for a copy of that source.  Optimized duplications are duplication jobs that use the DXi’s internal replication engine to send already deduplicated data from the source DXi to a target DXi.

Why not just write to a share and use replication?  Because NBU won’t know about the data on the target.  With duplications Netbackup knows where all of the images are stored and is able to manage them centrally.


SLP (Storage Lifecycle Policy):

An SLP is a policy that contains a source SU (Storage Unit) and one or more destination SUs.  In this case, the SUs are OST storage servers created on the DXi.  The source SU would be a storage server on the source DXi and andn the destination SU a storage server on the target DXi. 

Example:

Note: Duplications can also be run manually from the Catalog but most customers use SLPs

 

Opdup Flow:

  1. Scheduled SLP is run
  2. Successful backup completes
  3. Duplication is automatically started based on several different parameters such as size of the backup.  Duplications cannot be scheduled with SLPs
  4. The OST Plugin is called:
    1. Plugin makes a connection to ostd on the target DXi
    2. Target checks opduptranslate using:
      1. if( ReplicationAPI::util_getRepTranslationIP(statusInfo, target_hostip) == true )
        1. util_getRepTranslationIP checks to see if /data/hurricane/conf/NewReplication.conf exists
        2. If it doesn’t exist then it returns nothing
        3. If the file does exist then it returns the translated IP
    3. If no opduptranslate then  target runs getRepIPAddrr()
      1. Runs ReplicationAPI::config_getSegmentIPAddresses()
        1. One of the first things called by getSegmentIPAddresses() is:
          1. std::string cgiScriptName = "/cgi-bin/getsegmentips.sh";
    4. getsegmentips.sh
      1. Checks first for UnifiedNetwork.conf.  If it doesn’t exist
        1. reads replication.conf and looks at ReplicationSourceIP=
        2. If the ReplicationSourceIP has a value the it is returned as the replicationIP
        3. If it doesn’t have a value then the system IP address is returned
      2. If UnifiedNetwork.conf exists (it should on any 2.x machine) then
        1. First info from UnifiedNetwork.conf is gathered.  The following variables are created:
          1. IPLINE="Runtime_State.Interfaces.Interface.L3Interfaces.L3Interface.IP"
          2. SEGLINE="Runtime_State.Interfaces.Interface.L3Interfaces.L3Interface.Segments.Segment"
          3. ANYIPLINE="Configured_State.Interfaces.Interface.L3Interfaces.L3Interface.IP"
          4. ANYSEGLINE="Configured_State.Interfaces.Interface.L3Interfaces.L3Interface.Segments.Segment"
        2. These variables are parsing the XML in UnifiedNetwork.conf to get the respective values.  So IPLINE is taken from gets the IP from the following XML structure:

<L3Interface>

               <Name>eth0:1</Name>

               <IP>10.20.234.126</IP>

               <Mask>255.255.248.0</Mask>

               <DEFGW>10.20.234.126</DEFGW>

               <Segments>

                  <Segment>REP</Segment>

               </Segments>

</L3Interface>

 

        1. After the information is gathered:
          1. Read /opt/DXi/theSeer/UnifiedNetwork.conf line by line
          2. Find the IP address for the first L3 interface under the Runtime_State branch
          3. Get the segmentation information for that IP only if the Segment tag has a value of MGMT, DATA or REP
          4. Increment the segment count for each new segment
            1. A new segment is a 2nd or 3rd segment for an IP
            2. For example, an IP that had a segment of REP and DATA, DATA would be considered a new segment
          5. Find the IP address for ANYIPLINE under the Configured_State branch
          6. Get the segment info for that IP only if it has a segment of ALL
          7. Add that IP to the Any_IP_List tuple
          8. If there is more than one segment OR MGMT, DATA and REP are all assigned (meaning that someone checked all three boxes in the GUI instead of choosing ALL) then:
            1. If there is not more than one segment and the get_all_segments argument was used then return the now defined MGMT, REP, DATA IP addresses
          9. If the conditions in h above are not met then all IP addresses that have been gathered are added to $OUTLINE
          10. If $OUTLINE now has IP Addresses AND the get_all_segments was used then the list is returned and the script is exited
          11. If get_all_segments wasn’t used or $OUTLINE is empty:
            1. Get the value of ReplicationSourceIP in replication.conf file
            2. If no value then the single IP for the DXi is used as the replication IP
          12. If the management IP and data IP are not yet assigned then the single IP for the DXi is used
          13. The MGMT, REP, DATA IPs are now returned

 

  1. The replication IP returned from step 4d above is now the IP address that opdup will use.  As of 2.2.1, if a list of IPs is returned, opdup only uses the first IP in the list
  2. The plugin passes this IP to the source DXi as the destination IP to be used for the opdup
  3. The plugin sends an OST_COPY_EXTENT message to the source DXi that includes:
    1. NBU image file handle
    2. The Offset
    3. The size of the file
    4. Target StorageServer name
    5. Target host ip address (gathered above)
    6. Target LSU
    7. Target image name
    8. Target offset
    9. Timeout
    10. Flags
  4. Ostd on the source DXi makes a call to openCopyFileExtent() which resides in RECopyExtent.cpp:
    1. openCopyFileExtent() is a large function that does too many things to list in this document
    2. The important thing to note here is that eventually if there is no opduptranslate being used or if it is used but the IP doesn’t match with the configured destination then getsegmentips.sh is run and the entire process from above is run again
    3. Specifically, ReplicationAPI::config_getSegmentIPAddressesList is called.  Source code for ost in 2.2.1 shows that the plugin calls ReplicationAPI::config_getSegmentIPAddresses, not the List function
    4. So depending on the UnfiedNetwork.conf on the target, openCopyFileExtent() might get a different IP than the plugin did
    5. Typically, we don’t use opduptranslate anymore because we define an ost target IP with with data and replication
    6. This means that both the plugin and openCopyFileExtent() are both running getsegmentips.sh for every duplication job
    7. If the destination IP from openCopyFileExtent() doesn’t match the one from the plugin then the ostd channel is used instead of the replicationd channel
  5. If using the replication channel:
    1. Ostd runs sendReplicationRequest() which sends a request to replicationd that contains the following:
      1. Source filename
      2. Source storage server
      3. source IP address
      4. destination IP address
      5. Destination filename
      6. Destination storage server
  6. replicationd calls replicateOstObject()
  7. replicationd calls BPWAPIUtil and basically runs a sync for the images in this backup
  8. ostd sets the progress state to COPY_DONE

 

Conclusions:

Getsegmentips.sh and opduptranslate were solutions put in place when we had only 3 possible IP addresses defined and at the time were required.  Opduptranslate no longer worse because most customers will simply enable data and replication traffic types for the source IP address.  Using opduptranslate to map the DataIP to the RepIP in this scenario doesn’t work because they are the same and the target IP Mapping code won’t allow you to map and IP to itself.

 

I will continue to make changes to this document for newer FW versions as I discover them.

 



This page was generated by the BrainKeeper Enterprise Wiki, © 2018