Post Mortem: US Thanksgiving Outage

November 27, 2018
roll20

Steve Koontz, Lead Developer of Roll20, put together a post mortem to share with you on the service outage that some Roll20 users experienced on November 21-22. With no further ado, here’s Steve to talk about what happened and how we approached it:

On November 21st, Thanksgiving Eve here in the US, a little after 7PM PT, Roll20 suffered an extreme slowdown. For most users pages were taking a very long time to load and for some users not load at all. Our engineers responded immediately to the problem but the issue turned out to be difficult to diagnose. We hadn’t changed anything on our end going into the holiday but over the course of just a few minutes our database had been overwhelmed by requests and since everything was then running slowly it wasn’t easy to determine the offending process. We did some maintenance on our database, and 90 minutes after the issues started we rolled out an emergency patch that brought things back to normal. We continued to monitor the performance of the site.

Thanksgiving morning the issue reoccured, which let us know that our fix from the previous night had minimized the problem but not resolved it. Since we were dialed in on the issue it only took the team a few minutes to come up with a longer lasting solution that brought back almost 100% of Roll20’s functionality. We isolated the issue, which turned out to be a foible of the database software Roll20 uses when dealing with very, very large tables, causing a query that normally ran very quickly to suddenly become very slow. We’re still working to resolve the problem with a permanent solution. Until then notifications for completed queue processes, things like installing an addon into a game or rolling a game back, aren’t reaching the end user. The addons will still install correctly, you just aren’t getting the notification when they’ve finished. Refreshing the game detail page after a few seconds will show the addon installed and all content will be correctly added to your game. We expect a special patch for this issue in the coming week.

We apologize for the inconvenience this caused your holiday games.

-Steve Koontz, Lead Developer

The Roll20 Team

The Roll20 Team

ROLL20 Roll20 is the all-in-one solution for organizing and playing tabletop games online, allowing you to play your games anywhere and share them with anyone virtually. With the ability to choose from a number of popular titles built ready for your virtual tabletop, your adventures are limitless and you can get started playing with little to no prep. Dive into advanced features like Dynamic Lighting or explore macros and APIs to add some extra depth to your game. Roll20 lets you play your tabletop games, your way. YouTube Instagram