As some of you noticed, Backblaze experienced an outage. I want to provide some detail on what happened and the current status.
The Cause
Backblaze stores your backed up data in a top-tier data center facility. Last night at 7:35 p.m., a security guard entered the facility. The door slammed, causing the protective covering to open on an “Emergency Power Off” switch and setting off alarms. While this had no impact, in a moment of confusion, the guard hoping to turn the alarms off, pressed the Big Red Button, and shut off all power to that zone. At 7:36 p.m., the duty engineer escalated the situation and a resolution plan was designed. By 8:03 p.m. the power was fully restored.
Backblaze Response
As soon as the power went out, Backblaze’s monitoring systems alerted us to the issue and we mobilized the company. Most of us went immediately to the data center, while others double-teamed in support to help instantly address any questions. We then started the phased procedure of bringing the service up again: static web content (home page, help pages), dynamic web content (account pages, restore selection, billing), and finally all of the actual cloud storage.
We could have brought everything up very quickly, but we believe it’s critical to carefully check every system first. With over 5,000 spinning hard drives, this process takes a little while. Much of the team worked diligently through the night to bring the service back as quickly as possible.
Status
The static web pages were live within minutes of the power coming back online. We ran thorough tests throughout the system and fully brought the dynamic pages up this morning. This means you can browse the entire site, sign-in to your account, browse the files you have backed up, and even request (but not yet receive) a restore.
We expect to finish checking enough of the cloud storage systems later this afternoon to turn on the ability for backups to resume. At that point, most requests to restore data will also be fulfilled. However, some restores will be delayed a bit longer if they contain data on systems that we have not finished testing. As soon as we’re done, all restores will complete.
At this point, everything is progressing smoothly and we expect to have every piece of the service restored to complete operating procedure sometime this evening. While it is tempting to lock the Emergency Power Off switches, that would obviously defeat their purpose. However, we are looking at ways to speed the process in the future of performing all necessary tests in order to recover more quickly from any type of unplanned shutdown. Thank you for being patient with us as we work through this.