WRS Users experience
The experience of users of White Rabbit switches can give a good idea of
the reliability and of the weakest parts in a switch.
This page therefore aims to collect data from the field about problems
seen with switches.
To be able to get an idea of the percentage of switches that have
failed, we try to keep track of the number of switches that are from the
same installation of the ones that failed.
Note that the data has to be interpreted carefully. Many reported problems may be from early productions, from switches that have been used in labs, with open covers, many on/off cycles, changing SFPs etc. Furthermore this data is not exact as it does not track precisely the on-time, etc.
Again, this page is only to provide an idea to find out if there are any weak parts in the design that are needed to be addressed.
Reported failures
Date of failure | Serial number | Prod.date | HW version | Uptime (est) | Environment | Failure | Remarks |
---|---|---|---|---|---|---|---|
26-07-2016 | 7S-WR-18P-3.4-3H_214 | 3.4 | 1 year | CERN. Installed in experiment | No power after operating for several months | Short-circuit on T1 on minibackplane (so no PSU failure). Repaired under 7S-RMA-1607-WRS03. | |
2016 | 7S* | -- | 3.3 | 28 months | GSI, Air-conditioned Lab | Fan failure* | The switch is installed in an open rack in an Air-conditioned Lab. Constant 21 grads |
2016 | 7S* | -- | 3.3 | 28 months | GSI, Stable Room Temp | Fan failure* | The switches are installed in open racks. The room has a stable temperature throughout the years ~ 21 grads |
2016 | 7S* | -- | 3.3 | 25 months | GSI, Stable Room Temp | Fan failure* | The switches are installed in open racks. The room has a stable temperature throughout the years ~ 21 grads |
2016 | 7S* | -- | 3.4 | 20 months | GSI, Air-conditioned Lab | Fan failure* | The switch is installed in an open rack in an Air-conditioned Lab. Constant 21 grads |
2016 | 7S* | -- | 3.4 | 20 months | GSI, Stable Room Temp | Fan failure* | The switches are installed in open racks. The room has a stable temperature throughout the years ~ 21 grads |
2016 | 7S* | -- | 3.4 | 20 months | GSI, Stable Room Temp | Fan failure* | The switches are installed in open racks. The room has a stable temperature throughout the years ~ 21 grads |
2016 | 7S-WRS-18P-3.3-2H_034 | -- | 3.3 | 18 months | GSI, Stable Room Temp | Right fan failure | The switches are installed in open racks. The room has a stable temperature throughout the years ~ 21 grads |
2016 | 7S-WRS-18P-3.3-2H_058 | -- | 3.3 | 18 months | GSI, Stable Room Temp | Right fan failure | The switches are installed in open racks. The room has a stable temperature throughout the years ~ 21 grads |
2016 | 7S-WRS-18P-3.3-2H_067 | -- | 3.3 | 18 months | GSI, Stable Room Temp | Left fan failure | The switches are installed in open racks. The room has a stable temperature throughout the years ~ 21 grads |
2016 | 7S-WRS-18P-3.4-1H_108 | -- | 3.4 | 27 months | GSI, Air-conditioned server racks | FPGA fried | We don't know what happen. The switch was down overnight. Repaired by 7S |
2016 | 7S-WRS-18P-3.3-2H_067 | -- | 3.4 | 18 months | GSI, Stable Room Temp | 1 SFP Cage failure | A. Rubini will give more details about it |
2017 | 7S-WRS-18P-3.3-2H_065 | -- | 3.3 | unknown | CERN, used in lab with varying temp | Does not lock to master, slaves have problems locking | Very possibly, this is a production fault, users reported problems with nodes locking, we investigated and discovered the problem. Repaired under 7S-RMA-1704-WRS. Cause: broken capacitor that grounded VCXO control signal. |
*Unfortunately the fans were changed without writing down serial number or which fan was broken.
Purchased systems
User | Number | Version | Manufacturer |
---|---|---|---|
GSI | 24 | 3.3 | Seven Solutions |
GSI | 58 | 3.4 | Seven Solutions |
GSI | 13 | 3.4 | Creotech |
CERN |
Installed systems (from which above reported failures come)
User | Date | Installed base | Remarks | Manufacturer |
---|---|---|---|---|
CERN | January 2017 | 9 | Installed in experiment. | |
GSI | March 2017 | 13 | Rack installed. Stable T | Seven Solutions |
GSI | March 2017 | 13 | GSI Lab. Room T | 10 Seven Solutions 3 Creotech |
GSI | March 2017 | 5 | GSI Timing Lab. Room T. Fans disconnected | Seven Solutions |
Remarks received
- Since HW v3.4 the fans shipped within all WRS manufactured by 7S are Sunon MB40201VX-000U-A99 (MagLev with life expectancy > 8 years). Most of the FAN problems have been reported on WRS HWv3.3 **
- Fans are problematic since most generic types have an MTBF of 3 to 5 years at room temperature and much less at elevated temperatures. (Nico Coesel)
- We've used the 'MagLev' series fans from Sunon for a few projects (not on WRS - EB) and have not had a single failure ... yet. Reported MTBF's on these are on the order of 200,000 hours (~ 23 years). So, no bushings, no bearings, no problems. As an added benefit they are considerably quieter (mechanically) than even a good ball bearing fans - for applications where vibrations are important. (Tony Rohlev)
Observations
- The official mechanics BOM reads that as fan this type is used by Seven Solutions: Gicoda MB40201VX-000U-A99.
- "Gicoda MB40201VX-000U-A99" does not exist. It should be "Sunon MB40201VX-000U-A99" (Sunon, type number is the same). This is a DR MagLev = Dust-Resistance MagLev.
- Creotech and Seven Solutions (since HW version 3.4) deliver both with the same "Sunon MB40201VX-000U-A99".
- Creotech made one special serie only for GSI for evaluation with the Mechatronics G4020H12B-RSR that has a ball-bearing type.
- Decided to stick to the "Sunon MB40201VX-000U-A99" for all productions.
- ECIA Authorized search for MB40201VX-000U-A99
- This type has Last time buy of 31 December 2017 (see EOL notice). "000U products are not recommended for new designs; all new designs should start with 1000U series. 1000U series have lower power consumption, lower dB(A)".
- Recommended replacement type: MF40201VX-1000U-A99 (ECIA search)
Fan lifetime expectancy
- Life expectancy: 60.000 hours at 40 deg.C, 65% humidy, 90% CL (Confidence Level)
-
L10 Test procedure
In their example, 10% have failed after 28390 hours (3.2 year), which would correspond to a MTTF of 81300 hours (9.2 year).
5 July 2022