NETWizz
Well-Known Member
- Reaction score
- 1,932
Over the past few years, I have had nothing but great things to say about Brocade Switches.
They are incredibly reliable in general, but my perception is beginning to change within the last couple of weeks.
Last week, we had some major storms (the likely culprit) and I started getting saber-tooth pings like this:
Pinging 10.1.1.1 with 32 bytes of data:
Reply from 10.1.1.1: bytes=32 time=1ms TTL=64
Reply from 10.1.1.1: bytes=32 time=4802ms TTL=64
Reply from 10.1.1.1: bytes=32 time=1ms TTL=64
Reply from 10.1.1.1: bytes=32 time=3843ms TTL=64
Reply from 10.1.1.1: bytes=32 time=2ms TTL=64
Reply from 10.1.1.1: bytes=32 time=2711ms TTL=64
Reply from 10.1.1.1: bytes=32 time=1ms TTL=64
Reply from 10.1.1.1: bytes=32 time=3711ms TTL=64
Reply from 10.1.1.1: bytes=32 time=4ms TTL=64
Reply from 10.1.1.1: bytes=32 time=4181ms TTL=64
I had a switch I could not even get into via SSH. Then downstream from a certain point within the LAN everything was unreliable from bad quality VoIP to you name it.
After checking this is what I got:
Switchname#sh cpu
99 percent busy, from 1254 sec ago
1 sec avg: 99 percent busy
5 sec avg: 99 percent busy
60 sec avg: 99 percent busy
300 sec avg: 99 percent busy
The rest were like this:
Swichname#sh cpu
1 percent busy, from 1679 sec ago
1 sec avg: 1 percent busy
5 sec avg: 6 percent busy
60 sec avg: 1 percent busy
300 sec avg: 1 percent busy
Either switch would show the exact same tasks:
AnySwitch#sh cpu tasks
... Usage average for all tasks in the last 1 second ...
==========================================================
Name %
SigHdlrTsk 0
OsTsk 0
TimerTsk 0
FlashTsk 0
MainTsk 0
MportPollTsk 0
IntrTsk 0
stkKeepAlive 0
keygen 0
itc 0
poeFwdfsm 0
scp 0
appl 100
snms 0
snmp 0
rmon 0
web 0
acl 0
flexauth 0
ntp 0
rconsole 0
console 0
auxTsk 0
ssh_0 0
I replaced all the modules, cold restarted it, even disconnected ALL copper cables (to verify no loops), and nothing fixed it.
I swapped the Unit. Upon clearing the configuration on the old one, it never resolved the problem with the old switch. New one is fine:
*******************************
The next one is a PoE+ switch, and the PoE quit... It appears to be a Power Supply problem!
SSH@SwitchName#sh chass
The stack unit 1 chassis info:
Power supply 1 present, status failed
Power supply 2 not present
Fan 1 ok, speed (auto): [[1]]<->2
Fan 2 ok, speed (auto): [[1]]<->2
Fan controlled temperature: 38.0 deg-C
Fan speed switching temperature thresholds:
Speed 1: NM<----->61 deg-C
Speed 2: 58<-----> 79 deg-C (shutdown)
Sensor B Temperature Readings:
Current temperature : 38.0 deg-C
Sensor A Temperature Readings:
Current temperature : 36.0 deg-C
Warning level.......: 69.0 deg-C
Shutdown level......: 79.0 deg-C
Boot Prom MAC : cc4e.24b0.7852
Management MAC: cc4e.24b0.7852
SSH@SwitchName#sh inline power
SSH@SwitchName#sh inline power det
Power Supply Data On stack 1:
++++++++++++++++++
Power Supply Data:
++++++++++++++++++
power supply 1 is not present
power supply 2 is not present
POE Details Info. On Stack 1 :
General PoE Data:
+++++++++++++++++
Firmware
Version
----------------
Cumulative Port State Data:
+++++++++++++++++++++++++++
#Ports #Ports #Ports #Ports #Ports #Ports #Ports
Admin-On Admin-Off Oper-On Oper-Off Off-Denied Off-No-PD Off-Fault
-------------------------------------------------------------------------
Cumulative Port Power Data:
+++++++++++++++++++++++++++
#Ports #Ports #Ports Power Power
Pri: 1 Pri: 2 Pri: 3 Consumption Allocation
-----------------------------------------------
**********************
Then this morning had a switch that was just plain dead... Proactively noticed it, located, diagnosed, pulled a spare, updated/configured, validated config, Replaced, and Verified in 50 minutes! Including Data, VoIP, Wireless, and WiFi Management VLANs
They are incredibly reliable in general, but my perception is beginning to change within the last couple of weeks.
Last week, we had some major storms (the likely culprit) and I started getting saber-tooth pings like this:
Pinging 10.1.1.1 with 32 bytes of data:
Reply from 10.1.1.1: bytes=32 time=1ms TTL=64
Reply from 10.1.1.1: bytes=32 time=4802ms TTL=64
Reply from 10.1.1.1: bytes=32 time=1ms TTL=64
Reply from 10.1.1.1: bytes=32 time=3843ms TTL=64
Reply from 10.1.1.1: bytes=32 time=2ms TTL=64
Reply from 10.1.1.1: bytes=32 time=2711ms TTL=64
Reply from 10.1.1.1: bytes=32 time=1ms TTL=64
Reply from 10.1.1.1: bytes=32 time=3711ms TTL=64
Reply from 10.1.1.1: bytes=32 time=4ms TTL=64
Reply from 10.1.1.1: bytes=32 time=4181ms TTL=64
I had a switch I could not even get into via SSH. Then downstream from a certain point within the LAN everything was unreliable from bad quality VoIP to you name it.
After checking this is what I got:
Switchname#sh cpu
99 percent busy, from 1254 sec ago
1 sec avg: 99 percent busy
5 sec avg: 99 percent busy
60 sec avg: 99 percent busy
300 sec avg: 99 percent busy
The rest were like this:
Swichname#sh cpu
1 percent busy, from 1679 sec ago
1 sec avg: 1 percent busy
5 sec avg: 6 percent busy
60 sec avg: 1 percent busy
300 sec avg: 1 percent busy
Either switch would show the exact same tasks:
AnySwitch#sh cpu tasks
... Usage average for all tasks in the last 1 second ...
==========================================================
Name %
SigHdlrTsk 0
OsTsk 0
TimerTsk 0
FlashTsk 0
MainTsk 0
MportPollTsk 0
IntrTsk 0
stkKeepAlive 0
keygen 0
itc 0
poeFwdfsm 0
scp 0
appl 100
snms 0
snmp 0
rmon 0
web 0
acl 0
flexauth 0
ntp 0
rconsole 0
console 0
auxTsk 0
ssh_0 0
I replaced all the modules, cold restarted it, even disconnected ALL copper cables (to verify no loops), and nothing fixed it.
I swapped the Unit. Upon clearing the configuration on the old one, it never resolved the problem with the old switch. New one is fine:
*******************************
The next one is a PoE+ switch, and the PoE quit... It appears to be a Power Supply problem!
SSH@SwitchName#sh chass
The stack unit 1 chassis info:
Power supply 1 present, status failed
Power supply 2 not present
Fan 1 ok, speed (auto): [[1]]<->2
Fan 2 ok, speed (auto): [[1]]<->2
Fan controlled temperature: 38.0 deg-C
Fan speed switching temperature thresholds:
Speed 1: NM<----->61 deg-C
Speed 2: 58<-----> 79 deg-C (shutdown)
Sensor B Temperature Readings:
Current temperature : 38.0 deg-C
Sensor A Temperature Readings:
Current temperature : 36.0 deg-C
Warning level.......: 69.0 deg-C
Shutdown level......: 79.0 deg-C
Boot Prom MAC : cc4e.24b0.7852
Management MAC: cc4e.24b0.7852
SSH@SwitchName#sh inline power
SSH@SwitchName#sh inline power det
Power Supply Data On stack 1:
++++++++++++++++++
Power Supply Data:
++++++++++++++++++
power supply 1 is not present
power supply 2 is not present
POE Details Info. On Stack 1 :
General PoE Data:
+++++++++++++++++
Firmware
Version
----------------
Cumulative Port State Data:
+++++++++++++++++++++++++++
#Ports #Ports #Ports #Ports #Ports #Ports #Ports
Admin-On Admin-Off Oper-On Oper-Off Off-Denied Off-No-PD Off-Fault
-------------------------------------------------------------------------
Cumulative Port Power Data:
+++++++++++++++++++++++++++
#Ports #Ports #Ports Power Power
Pri: 1 Pri: 2 Pri: 3 Consumption Allocation
-----------------------------------------------
**********************
Then this morning had a switch that was just plain dead... Proactively noticed it, located, diagnosed, pulled a spare, updated/configured, validated config, Replaced, and Verified in 50 minutes! Including Data, VoIP, Wireless, and WiFi Management VLANs