02-23-2016 11:03 AM
I work for a company that has a pretty close relationship with another independent telco (same General Manager, even), and I've seen this problem appear on BOTH of our networks, so I figure it's time to reach out to the community.
Basically, what happens is that when there's a topology change in the network, we experience crippling storms that appear to be from a loop or reflection.
In my case, this appeared after migrating our TV feed from a single link to dual links into each VPT (6450s) managed by Spanning Tree from a Juniper MX router. Video settings all set up properly, everything configured so that VPT1 normally sends TV out onto PG1. Whenever a card was recovering from a reboot - whether a software crash, reseat, or a circuit breaker reset - the EPS network went into a storm. The video router's DoS protection mechanisms activated reporting > 100Kpps of spanning tree traffic received on the ports from the VPTs. That's one hundred thousand packets per second on a port that should see ONE packet per second... We tried many fixes, and audited the network completely for preferred PG, double-fault PG, video settings, IGMP versions, and setting the right VLANs on ring profiles (one time a 6151 went berserk, found that one fast!). Nothing stuck. The only solution for us to be able to keep redundancy was to migrate from the Juniper MX router onto a Juniper EX switch and use a Redundant Trunk Groups, which is the same as a Cisco FlexLink.
In my colleague's case, this is appearing as he attempts to upgrade portions of his EPS ring to 10 Gbps. His CO stuff is already 10 Gbps (if I remember correctly). When he takes down PG2 to a remote, leaving everything running on 1 Gbps PG1, he completely loses that remote and he sees 7.5 Gbps traffic on some ports in his CO. My colleague's network has STP entering from some Juniper EXes as well as a Cisco 6500 that feeds TV in. I think he's been working on getting STP out of the picture entirely, although there is some complexity because the E7 that hangs off of one section is allegedly STP-aware - unlike B6. I believe Josh Levi is somewhat familiar with this case.
We are running various OS in the 7.3 train on our B6 stuff. We both have some E7 hanging off of B6.
Has anybody else had REALLY strange issues with spanning tree? Or possibly E7 hanging of of their B6?
02-24-2016 05:05 PM - edited 02-24-2016 05:07 PM
Found this in some release notes:
Fixed in 7.3.30
• Resolved Issue: MAC addresses can become "stuck" after network events such as link
failures, EPS failovers, or other disturbances. [BSIX-16114, BSIX-15683] Calix has
observed a rare occurrence where the B6 may retain old MAC addresses, or may time out MAC
addresses while statically configured. Calix is investigating this issue.
Can anybody at Calix comment on this? On further investigation we have a couple of cards running releases prior to 7.3.30, seems a few are even on 7.2 and 7.1 trains :O.
08-22-2017 07:31 PM
We're not entirely sure. My colleague had a lot of hidden EPS issues that he had to resolve. In our network, the problem went away after an upgrade. (I think so - this was over a year ago!) What I can tell you is that we're still running off of an EX switch with a Juniper RTG feeding the VPTs and we haven't had any issues related to a card reboot other than the obvious.
Other sites of ours use STP so I believe that the issue was related to that bug.
09-11-2017 02:45 PM
We are still having this issue when the ring breaks we see a surge in multicast traffic. We havn't been able to get a packet capture but we do know the traffic is on Vlan 3 (video). The ring gets overwhelmed and we start loosing connection to some of the cards on that ring. We notice a drop in traffic on the video feeds. We are also noticeing this on both of our eps rings so must have something to do with our setup not something ring specific. Whats hard about this is that it will happen randomly. We have tried downing the ring and waiting to do a packet capture and it won't happen.
09-12-2017 11:35 AM
We have rebooted everything on the network. One thing we have noticed is that when this happens on our North ring we get a consistant 2.5gbps of bandwidth. But when this happens on our South EPS ring we get 8.6gbps. The only thing different is that the South ring has a 10gig sub ring where as the north ring only has 1gig subrings. We also allways see the shelf with the subrings attached transmitting the bandwidth but not receving which makes me think this is the source. So im thinking this has to do with igmp duplication related to subrings.