Formstack Downtime Update + Post-Mortem Assessment: 7/7/14

Written by Chris Byers on July 7, 2014

Posted in Formstack Updates

Between approximately midnight and 4 a.m. EST, Formstack experienced server downtime. Our servers have been tentatively restored, and we will continue to monitor throughout the morning. We will be posting more information later today about the downtime. Thank you for your patience as we worked and continue to address this issue. We apologize for any inconvenience we caused during this extended downtime.

UPDATE 12:03 p.m. EST
At 11:28 p.m. EST last night, our primary database server encountered a critical error that caused it to intermittently be unable to accept data from our web servers. We want to walk you through what transpired last night, as well as what we did to mitigate these errors:

  • At 11:42 p.m. EST, our database server stopped accepting new connections from our web servers and the app started displaying an ‘Unable to connect to database’ error to users.
  • Once notified of the issue, we started investigating the root cause of the problem. Preliminary findings for the issue suggest that a lack of hard disk space on the database server was the ultimate cause in the database server failing. We are currently investigating the underlying cause of the database server’s hard disk becoming full.
  • After freeing up space on our database server and attempting to restart the database service, it was discovered that the database files were corrupted. We then started the process of failing over to our backup database server that we use to replicate the data from our primary database server.
  • At 4:02 a.m. EST, we were able to successfully failover to our backup database server and restore functionality to the site.

During this initial four minutes of intermittent connectivity issues, it is possible that the Formstack application rendered forms and allowed form submissions. Due to the nature of the database issue, there is a chance that submissions made between 11:37 and 11:41 p.m. EST may have been lost.

To contextualize this possible data loss, we average less than 200 submissions in any given 4-minute window during this hour. Keep in mind that this was a holiday weekend, so the actual number of lost submissions is most likely even less. Additionally, the downtime was intermittent, so many of the users who tried to complete a form were unable to submit their data. We are currently investigating the extent of that data loss and will update you with any new information. It is possible that submissions that were lost may have successfully sent notification emails with the data, but not saved to the database.

UPDATE 1:15 p.m. EST
We experienced another blip of downtime for approximately four minutes at 12:52 p.m. EST. We’re back up and running but are monitoring our servers. We are also investigating the reason behind the lack of hard disc space. Please let us know if you are having any issues by tweeting us or posting on our Facebook page.

In addition to the technical issues mentioned, our ability to failover to a backup database should be possible in mere minutes. Unfortunately, this technical issue uncovered a number of issues in the redundancy not of our systems, but our people. We have too few people with the appropriate access and knowledge of what to do in the case of an emergency.

In this case, the speed to resolve our issues was due to a few issues:

  1. Unclear or lacking instructions on the fastest path to switching from our primary database to our backup database, which should be clearly documented in our internal systems.
  2. A number of our necessary outside services require strict access and permission to systems in which we had not given enough of the technical team access to those resources. This meant we spent valuable time running through the administrative processes of adding the right people when we could have resolved the issue in a much shorter time.

To our customers: I am sorry. We owe you a further response as to how we plan to mitigate all of this in the future. Please expect from us more information later in the week on our plans to deepen our ability to respond to issues as they arise in the future. Thank you to our customers who were awake and patiently waited as we resolved the issue. We apologize for any inconvenience that we caused during your work day.

Of course, if you have any questions, our support team is at the ready to speak with you about this downtime. You can reach out to them at support@formstack.com.

-Chris Byers, Formstack CEO