Last week, when virtually every Skype user was left without communications for the better part of thirty hours, there was speculation as to why service had gone down worldwide. Everything from governmental conspiracy theories to malicious code was blamed for the outage. Skype then said that it was due to a problem with “supernodes”, a component of the system that few have ever heard of.
Now, Skype CIO, Lars Rabbe has provided extensive details about what caused the VoIP (Voice over Internet Protocol) service to suffer the first widespread failure since 2007. The culprit was none other than a bug in a Windows version of the Skype client, more specifically version 5.0.0152.
Apparently the bug didn’t impact other clients or other versions, but Skype stated about 50 percent of users were running the affected version and the crashes caused roughly 40 percent of those Windows clients to fail. The problem arose because about 25-30 percent of publicly available surpernodes were on these clients, so the crashes were amplified. As networks tried to cope with the offline supernodes, failover mechanisms were triggered that “led to near complete failures that occurred a few hours after the triggering event”, according to Rabbe.
As Skype engineers tried to rectify the issues, the solution came when thousands of new dedicated supernodes were introduced to the network. Early Friday the supernodes became stable and service started to return to normal.
Skype says it will work to prevent this type of outage from occurring in the future by preemptive engineering and making sure users automatically update any buggy software that is identified. As we reported previously, Skype plans to provide paying users with vouchers to compensate them for outage time.