The Billing Bungle
Date of Event | December 3rd, 2019 |
---|---|
Cause | Autonomous "delete" API call from Billing software (for no good reason) |
The Billing Bungle was the second loss of the wiki, and a real wake up call to ensure proper backups of the wiki and PPX Discord bots were in place. Unlike The Wikitastrophy, this incident happened autonomously and without discernible cause, due to some sort of strange billing charge appearing on the backend system. Server host provider Sarah_Kat quickly disabled API connections to prevent a repeat, but the damage had been done.
Timeline
At 10:14 AM MST, on December 3rd 2019, SpoiledBIO messaged in the PPX Minecraft Discord Channel that the Minecraft server could not resolve the host name. Brodie Snavely responded that the server console seemed fine, and asked if anyone else had issues.
At 10:36 AM MST, Brady Coles replied that it sounded like a DNS issue, and to use the numerical IP rather than the one that uses the pixelatedpickaxe.com domain. He also noted that the entire domain seemed down, as the entire website, wiki included, were down.
At 10:41 AM MST, Brady Coles messaged in the website Discord channel that the pixelatedpickaxe.com name servers were acting up, notifying Brodie Snavely, the domain owner, and Sarah_Kat, the server host provider sysadmin, and name server provider.
At 11:55 AM MST, Sarah_Kat responded with "weird..."
At 12:08 PM MST, Sarah_Kat responded with "i've got... nothing..."
At 12:20 PM MST, Sarah_Kat messaged "what... the frick... i don't like this one bit"
At 12:55 PM MST, Sarah_Kat first explained what appeared to cause the issue. An automated procedure created a $0.00 bill for a free product, something it hadn't done all year, but marked the due date to February. Although the free bill gets marked "paid", the billing software first sends a "delete" API call.
At 1:00 PM MST, Sarah_Kat elaborated on the problem. "Normally, if something's screwed in the billing system about seeing a "free" product, it'll do the standard "Disable after a week" (Get's clients attention to come pay their shit if it's paid like normal) and 3-ish weeks after that (After a second invoice is set to be due), then it'd auto delete.
But it did none of that. It slept on this for... who knows how long and suddenly decided "Let's make an invoice due in febuary, setting the invoice 9 months late" and 9 months late is more than the 4-ish weeks set for autodelete and... "
At this time no one except Sarah_Kat seems to really understand the full implication of this event.
At 1:02 PM MST, Brodie asks what got deleted, Sarah_Kat responds that the static website files, database, and DNS zone files are gone.
At 1:06 PM MST, Sarah_Kat explains they are trying to check for any backups, or recover any data. It doesn't sound promising.
At 1:08 PM MST, Brady notes that the Discord bots which are on the same server were apparently still running when the name server issue was first noticed, Sarah_Kat remarks "I wonder if the bots just kept running from RAM as their data structure got deleted out from under them", followed shortly by "I hate this."
At 2:15 PM MST, Sarah_Kat confirms that data recovery led no where, and asks what the next course of action should be. Brady notes he has a database backup from August 9th, 2019 (4 months out of date), and copies of most of the Discord bots and other webpages. The only course now is to rebuild with the remnants.
At 9:48 AM MST, on December 4th 2019, Sarah_Kat advised Brady that the webserver access had been reprovisioned.
On December 12th, Brady messaged that "Today is the day, I fix the wiki." He started with the wiki, whose setup was much faster than previous attempts, and also documented step by step. With the existing backup of the database, much of the wiki was recovered, though images and changes in the last 4 months were lost. Shortly thereafter he set up the Discord bots TERMS and PatheticBot, recreating and then backing up the data list for TERMS.
On December 14th, Brady created a custom bash script to archive all relevant parts of the website, including Discord bots, as well as the mysql database, and have it emailed to him weekly as a permanent backup solution.
The Aftermath
As a result of the Billing Bungle, the PPX wiki has lost many pages and all its images, but it has gained as well. Short URL's have been set up for the first time, proper weekly backups are in place, and the Discord bot for displaying the status of the PPX Minecraft server has been rewritten better than ever. In theory, this sort of event will not be in any way catastrophic in the future, as no more than a weeks worth of data should be lost, and recovering should be extremely simple.
The other major result of this incident is Sarah_Kat looking deep into the systems running their web servers, especially the billing system, as well as backup systems and more.
See Also
- The Wikitastrophy - The first complete loss of wiki, 11 months earlier.