Resolved - Saturday March 11, 2017

Wave will be unavailable commencing at 9 a.m. Eastern on Saturday, March 11, 2017, while our engineers carry out planned some maintenance to update our servers. All of Wave will be unavailable while this work takes place.


2:49 p.m. Eastern: Wave is back up and running now. The teams here will continue monitoring the systems to ensure we don't see any complications.

Once again, I apologize for the inconvenience today's outage caused.

We initially expected things to be out of commission for less than an hour, though we allocated 2 hours to the process to be safe. Unfortunately, after we ran through the full system update, when we did our rigorous checking to ensure everything was correct, the team discovered some errors. We then began a roll-back process, which encountered its own complications.

The engineering teams at Wave have countless decades of combined experience, and they plan out service windows like today's very meticulously, including dry runs on mirrored systems to make sure we can foresee possible snags. Unfortunately, snags do arise, and we were hit with multiple problems today that we could not foresee. Most regrettably, the fix simply wasn't something that could be done quickly. Machines need to be purged and restarted, and that unfortunately is something that can't be accelerated.

We also choose our maintenance windows very carefully, with full consideration to how we can impact the smallest number of customers during the down time. We also need to consider the availability of services and partners needed to do the work correctly and efficiently. This morning's time slot was the best available opportunity, though we know that in actuality many customers were inconvenienced, for which we do apologize.

One final point: Whenever we experience unforeseen problems with our software or services, we do a careful retrospective of actions that were taken, problems that were encountered, processes that we followed along the way, and more, and we then apply any learnings to our future workflows to make sure we never lose an opportunity to improve and correct for mistakes. Please be assured we will do so with today's outage, in order to reduce or eliminate the possibility of another recurrence.

Thanks again for your patience and understanding today. We care deeply about making sure you have access to the tools and services you need, when you need them. We look forward to serving you well for a long time.







1:27 p.m. Eastern: Our engineering team continues to work with external providers in getting the systems fully functional. At this time, our primary goal is to ensure stability and integrity, and make sure that no additional surprises arise due to haste. 
Though apologies start to sound hollow at this point, we apologize nonetheless for the delays today. Our engineering teams prepare well in advance for rollouts like todays, including dry runs and detailed planning. Unfortunately technology doesn't always co-operate. 
Once the dust settles, all service interruptions like this are deconstructed carefully, causes are identified, and plans are made and re-made in order to learn from the incident and avoid repeating anything like it. Please be assured we will be diligent in putting today's problems to good use for improved services in the future. 

12:42 a.m. Eastern: Apologies for the continued delays. While correcting for some of the issues that caused the initial delay, we encountered a new bug which the engineering team is currently working at resolving. Estimated time is in the 30-minute range. 
We understand how important it is for you to have access to our services, and that our estimates and expectations around time to resolution have been inaccurate today. 
As always, we will work toward understanding what went counter to plan, and doing better in the future, in order to deliver the kind of service we expect of ourselves. 
More updates as they become available. Thank you for your patience. 

11:52 a.m. Eastern: Apologies for the continued delays. Current estimates have us back up in about 5 minutes. 

11:18 a.m. Eastern: We've decided to re-run some processes to ensure the update is 100% successful. This will add some time to our maintenance window. Current estimates have us back up in the neighbourhood of ~11:45 Eastern. We apologize for the extended downtime. We take every precaution to minimize disruptions, but will always err on the side of caution to ensure that data and long-term site performance for our customers are in the best shape. Thanks for your understanding.

10:45 a.m. Eastern: We're in the home stretch of the maintenance window. Engineering indicates we may run ~10 minutes over time. Working on keeping that short. Thanks for your patience.

9:54 a.m. Eastern: Making good progress. Services are still down for the upgrades.

8 a.m. Eastern: Work is still scheduled to take place starting at 9. Please stay tuned for details.


Was this article helpful?
1 out of 3 found this helpful