HOWTO: Networking: Verify that the LACP bonding is working in Linux |
This information is from an escalation sent to SES from SPS regarding how to verify LACP bonding is working
from the Linux OS perspective.
Thanks to Ryan Davies/SES and Alain/SUS.
Notice that Alain said the Partner Mac Address is null (00:00:00:00:00:00) when LACP isn’t working correctly. When this happens the aggregator IDs will also be different. It is possible, and not all that uncommon, for the partner MAC address to be populated but the aggregator IDs to still be different. This means that the switch is configured for LACP but the cables aren’t connected to the correct ports.
More information ---
There are two types of LACP modes, active and passive:
· Active - Always send LACP data units (LACPDU) on the wire
· Passive - Only respond to LACPDUs on the wire.
Our appliances, and I’m guessing that includes any Linux server, require that the switch LACP mode be set to active
1. When an SN appliance or DXi boots with LACP configured, it starts sending LACPDUs on the wire.
2. When the switch sees that the appliance is up, it also starts sending LACPDUs on the wire
3. The aggregator is then negotiated, which is the whole purpose of LACP.
a. It might help to think about LACP along the lines of DHCP. You can create a static aggregator, like a static IP, or you can use LACP and let it get created dynamically, like with DHCP.
It’s important to note that mode 0 (Balanced Round-Robin) is the option to use if a customer desires to use a static aggregator (equivalent of using a static IP). Unlike the case of using DHCP, it is recommended to use LACP. In fact, the argument could be made to not even provide customers with the option to use mode 0. There are too many customers who have been told by Quantum personnel over the years that mode 0 doesn’t require any configuration on the switch. Using mode 0 without a configuration on the switch is just like using LACP without configuration on the switch. We used to even state that in our documentation.
----------
Conclusion
1. If the Partner Mac Address in /proc/net/bonding/bondX is populated AND the aggregator IDs are the same then LACP is working properly.
2. If step 1 isn’t true or mode 0 is being used and there is suspicion of an aggregation mismatch, down all but one of the ports in the bond and see if the network problems cease.
a. In my experience, a bond on a host or an aggregator on a switch with a single connection is no different than a single connection without the bond or aggregator. If you’re ever working an issue where you suspect an issue with aggregation, you can just down all but one of the ports and test.
3. The easiest way to take down ports in a bond is to use ifdown. Example for a bond with eth4 and eth5 as slaves:
a. ifdown eth5
4. If aggregator IDs don’t match then I always ifdown the slaves with aggregators that DON’T match. I’ve never seen this interrupt any existing connections.
Ryan Davies | SES | Quantum Corporation | Mobile: +1 (208) 419-6548 | ryan.davies@quantum.com
From: Mamoon Ansari
Sent: Friday, May 12, 2017 8:24 AM
To: Jonathan McNerny; Alain Renaud; Jeff Syme
Cc: DL-SN-SES; DL-Service EMEA - SW; DL-KL ASPS Tech Support; DL-AMER-SPS; StorNext Sustaining Engineers; Joshua Martin
Subject: RE: Xcellis Workflow Director Escalation To SES From SPS: CASE0339896, Customer:CBS Corp -- how to verify that the LACP bond is OK in Linux vs. enet switch, Serial Number: AV1638CKH00220
Along with what Alain said, you also want to check the “Aggregator Ids” (per Ryan Davies). They should be same for slave interfaces.
Example: below are not same. Houston, we have a problem:
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 17
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:ae:52:88:08:bd
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:ae:52:88:08:bf
Aggregator ID: 2
Slave queue ID: 0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009
Thanks,
========================================================================================================================================================
Mamoon R. Ansari | Technical Support Engineer, Software Product Support | Office: 630.640.1025 | mamoon.ansari@quantum.com | Quantum.com | Support: 800.827.3822
From: Jonathan McNerny
Sent: Friday, May 12, 2017 8:58 AM
To: Alain Renaud; Jeff Syme
Cc: DL-SN-SES; DL-Service EMEA - SW; DL-KL ASPS Tech Support; DL-AMER-SPS; StorNext Sustaining Engineers; Joshua Martin
Subject: RE: Xcellis Workflow Director Escalation To SES From SPS: CASE0339896, Customer:CBS Corp -- how to verify that the LACP bond is OK in Linux vs. enet switch, Serial Number: AV1638CKH00220
The other day I found that we have ‘iftop’ on both Centos6/7, which is cool because you can do lots of filtering as with tcpdump, but iftop is centered around bandwidth monitoring.
So, if you know the NICs in the bond….
[root@cx-node1 ~]# ip addr show | grep master
5: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
6: eth3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
You can monitor each port in separate shells.
iftop -B -i eth2
iftop -B -i eth3
Then just pay attention to the stats reported on the bottom of the live top like output.
#eth2
#eth3
#client side
Jon
Jonathan McNerny - Software Product Support
720.249.3916 | Jonathan.McNerny@Quantum.com | Quantum.com
From: Alain Renaud
Sent: Thursday, May 11, 2017 4:34 PM
To: Jeff Syme
Cc: DL-SN-SES; DL-Service EMEA - SW; DL-KL ASPS Tech Support; DL-AMER-SPS; StorNext Sustaining Engineers; Joshua Martin
Subject: Re: Xcellis Workflow Director Escalation To SES From SPS: CASE0339896, Customer:CBS Corp -- how to verify that the LACP bond is OK in Linux vs. enet switch, Serial Number: AV1638CKH00220
normally when a LACP bound is correctly set you should see something like this on the linux side. Note that usually when it is not configured properly the Partner Mac Address is set to 00:00:00:00:00
# more /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 500
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 33
Partner Key: 510
Partner Mac Address: 00:23:04:ee:be:02
Slave Interface: eth4
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:9f:55:3a:38
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: eth5
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:9f:55:3a:3a
Aggregator ID: 1
Slave queue ID: 0
Another option is to look at the ifconfig output of both interface in the bound and confirm that they are somewhat balance. Well at least that there is not one close to zero.
# ifconfig eth4
eth4 Link encap:Ethernet HWaddr A0:36:9F:30:A5:F0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:280918667889 errors:0 dropped:105908 overruns:0 frame:0
TX packets:198236148255 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:364072984209121 (331.1 TiB) TX bytes:242784925987710 (220.8 TiB)
# ifconfig eth5
eth5 Link encap:Ethernet HWaddr A0:36:9F:30:A5:F0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:100424527061 errors:0 dropped:151830 overruns:0 frame:0
TX packets:194613416278 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:111270973334344 (101.2 TiB) TX bytes:235191060763130 (213.9 TiB)
On May 11, 2017, at 18:19, jeff.syme@quantum.com wrote:
Case ID : 0339896
Company Name : CBS Corp
StorNext Serial Number: AV1638CKH00220
Summary : how to verify that the LACP bond is OK in Linux vs. enet switch
Escalated On: : May 11, 2017 16:19:09 MDT
Severity : Minor
Customer Temp : Normal
Site Status : Normal
New Install : Yes
Has it worked before : No
Secure site : Yes
HA System : Yes
Medicus Site : No
Suspected Bugs : None
Logs Location : http://susrepo/ticketinfo/SR03xxxxx/SR033xxxx/SR0339xxx/SR0339896
Your Email Address : jeff.syme@quantum.com
Contract Coverage : CRU/FRU GOLD Std Service Model
Assign to SES Person : next available
----------------------
Installation Details
StorNext Platform : Xcellis Workflow Director
StorNext OS : CentOS-RH7.2 equivalent (3.10.0.327.4.5.EL7)
StorNext Version : 5.3.1
StorNext Build : 00000
Client Platform :
Client OS :
Client Version :
Client Build :
----------------------
Problem is on : MDC
Date/Time Occurred : ongoing
Description :
Isn’t there some way to verify that the bond is OK in Linux and that it’s happy with the config from the switch?
I know that about 6 months ago there was an issue with some M440’s that I deployed that was complaining about the LACP config because it was correct on the MDC’s but not the switches
According to my Cisco switches, the LACP bond is functional, I just want to be absolutely sure that the Xcellus nodes are happy with the network config.
Mitch Spacone
CBS Television City | Media Maintenance
7800 Beverly Blvd, Los Angeles, CA 90036
Mitch.spacone@cbs.com
Desk: 323.575.4141
Mobile: 310.347.7771
What has been done and results:
Ron Housman and I wrestled with a # vip_control being bound to the wrong bond in case 339895 | CBS | XCELLIS / HA_VIP not functional <now resolved>
Mitch is asking for two different serial numbers: an Xcellis and a M440
Questions :
how to verify that the LACP bond is OK in Linux vs. enet switch
----------------------
Configuration Changes
Any recent changes? : Yes
Stornext : new install, left over from PS engineer
System SW : No changes
System HW : No changes
Network : No changes
RAID : No changes
----------------------
This email was generated by: http://denlgservicev1.quantum.com/cgi-bin/escalations/index.cgi
Alain Renaud | Senior Sustaining Engineer| Quantum | VOIP: 612-567-4680 | Alain.Renaud@quantum.com
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |