upgrade to 3.51 breaks eth0 connected to front panel

  • 2
  • Problem
  • Updated 3 months ago
Hello,
I just upgraded from 3.3 to 3.5.1. We had our eth0 port plugged into a front panel port, as we do not have an oobm network. This was working just fine, and I could login via ssh to the switch.

Since the upgrade I cannot even ping the eth0 port. If I log into the switch via serial cable, I can ping eth0 internally, and it shows the proper ip address. I've tried ifdown and ifup on the port as well.

Is there any other troubleshooting ideas I can try?

Thanks,
Daniel
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb

Posted 4 months ago

  • 2
Photo of Jason Guy

Jason Guy, Employee

  • 1,712 Points 1k badge 2x thumb
Need a bit more info here. Is this a layer2 network, or Layer3? If it is layer3, we have seen some issues where the quagga/frr daemons do not behave together, and it is best to migrate to FRR as defined here.
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb
Hi Jason,

This is purely a Layer 2 network, with just a couple of vlans that don't talk to each other. And the front port it's plugged into is on the same default vlan as my workstation.

Though I just realized, I did not define which vlan the eth0 port is on. Will it just use the default vlan 1?

Thanks,
Daniel
Photo of Jason Guy

Jason Guy, Employee

  • 1,712 Points 1k badge 2x thumb
Hmmm... is the eth0 of switch 'A' is plugged into the front panel of switch 'A'? If that is the case, it is possible the upgrade has not completed. Perhaps it would be best to open a support case with GSS. 
The default untagged vlan is 1 for vlan-aware bridging.
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb
That is correct, everything is happening on single switch 'A'.
I have all the front ports in a bridge that is vlan-aware.
Is there any way to see if the upgrade did not complete, such as apt reporting that there are some upgrade-able packages still?
Photo of Jason Guy

Jason Guy, Employee

  • 1,712 Points 1k badge 2x thumb
When the switch is upgrading the services, like switchd, it can restart the daemon, breaking the connectivity for eth0. I think apt would already have all of the packages downloaded, so you can try apt-get -f install and see what it says.
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb
apt reports that nothing needs upgrading or installing.
When I did the upgrade, the switch did have to reboot, and that's when I lost and never recovered eth0 connectivity.
Photo of Jason Guy

Jason Guy, Employee

  • 1,712 Points 1k badge 2x thumb
Do you have a default route? Can you ping anything out eth0? Is the port up?
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb
the interfaces file for eth0

auto eth0
iface eth0
10.1.1.2/24
gateway 10.1.1.1

it cannot ping out to anything. I did a tcpdump and I can see it is receiving broadcast traffic but that is about it.
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb
Should I file a trouble ticket at this point?
Photo of Jason Guy

Jason Guy, Employee

  • 1,712 Points 1k badge 2x thumb
Yeah, that is really the only way to determine what is wrong I think.
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb
Alright, Thanks for your help!
Photo of Daniel

Daniel

  • 190 Points 100 badge 2x thumb
Cumulus Support solved this issue for me. Here's the solution...

We found that the behavior at play here is due to the change made tracked with release note 752:

https://support.cumulusnetworks.com/hc/en-us/articles/115015543848-Cumulus-Linux-3-5-Release-Notes#rn752

There were some instances where the bridge mac address might change and cause some problems with spanning tree or otherwise, so we've nailed up the bridge mac to match eth0 in Cumulus Linux 3.5.  In this case, with eth0 looped into the front panel bridge, there was a conflict.

  We worked around it by configuring the bridge mac and eth0 manually to not match with 'hwaddress' in /etc/network/interfaces and found that connectivity was restored after.
Photo of Adam Gray

Adam Gray

  • 10 Points
This issue appears to be present in 3.5.2 as well.