If you were online on Thursday evening (and the chances are that if you’re reading this, you were, just like me) then you’ll probably have noticed that Facebook – the sole reason for the existence of the Internet in the eyes of many – was unresponsive. The fact that a simple website was down shouldn’t have been a major news story across the world on Friday morning, but this isn’t just any website – it’s Facebook.
The website acts a meeting place for friends, a reunion ground for families and a way to keep in touch with people when they’re not around. Facebook has rapidly replaced the phone, the mobile phone and even text messaging as the preferred method of communication between the young – and its absence from our lives, even for a few short hours, is unacceptable.
The outage suffered by Facebook late Thursday evening was apparently its worst for 4 years, prompting the website to apologise to the millions (and millions) of Facebook fans who wanted to use the website to tell friends what they were having for tea, what their kitten had just done and to ask what they were watching on TV. The outage meant that people had to use ‘phones’ or, worse still, talk to family members who were in the room with them.
So what caused this terrible error that ruined the evenings of so many people? It was in fact human error. Someone had updated something to one of Facebook’s sub routines and hadn’t checked it properly. When looking for an error, it found itself, and caused a loop that brought the whole system crashing down.
Facebook’s Robert Johnson, one of the main bods who work on the backend, commented:
“The way to stop the feedback cycle was quite painful – we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.”
“We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.”
This teaches us two things:
Firstly, always check any updates to your website before putting them live. Even the smallest of errors with your code could cause your entire website to crash.
Secondly, a website can become so big, so important, that it can affect the lives of millions and cause mass hysteria when it’s unavailable. Would anyone miss your website for a few hours?