Initial D World - Discussion Board / Forums
   
Welcome Guest ( Log In | Register )Resend Validation Email

DJ Panel ( Server Stats )   Song History   Initial D World Chat Room (Discord)   Broadband Stream
RADIO BROADCAST » streaming at 96kbps with 5 unique listeners, playing Fastway - Revolution

       

 

Views: 13,539  ·  Replies: 0 
> Database Damage Report (2007-12-12), Details regarding the recent 15-hour downtime
Perry
    Posted: Dec 17 2007, 07:17 AM


Like an eagle!
Group Icon

Group: SITE OWNER
Posts: 8,014
Member No.: 1
Joined: Sep 15th 2002
Location: San Leandro, California





The crash came very suddenly, with absolutely no sign beforehand. It happened on the morning of last Wednesday, at precisely 3:29am Dec 12 2007 (Pacific Time), right after the daily routine maintenance took place. The entire ibf_posts table that stored every post (~850,000 posts) on the forums, was wiped out clean. The actual cause for this crash is still unknown. However, there is a possible culprit.

The crash was most likely caused by the 24-hour routine maintenance scripts that I wrote four months ago. ( Read this thread for more information on these scripts: https://idforums.net/index.php?showtopic=31...ndpost&p=853166 ) My suspicion is not entirely by intuition. The timing of the crash and the execution time of these maintenance scripts was too much of a coincidence to be unrelated. There are three scripts that run automatically every day.

The first script runs at precisely 3:28:00am (Pacific Time), its function is to repair any damaged table structures in the database. It takes approximately 25~30 seconds to execute. The second script runs at precisely 3:29:00am (Pacific Time), its function is to optimize all database statements. This step is crucial to save database space usage and to increase / maintain performance. This script takes approximately 10~15 seconds to execute. The third script runs at precisely 3:30:00am (Pacific Time), its function is to make a compressed gzip backup of the database. This script takes as long as one minute to execute. That's the reason why there is a noticeable slowdown during 3:30am~3:31am every day. Now that you know what routine maintenance scripts are, you may be wondering how they would cause the crash.

A possible scenario would be that the first script took longer than the expected time to complete execution, which caused the second script to run simultaneously with the first script. I'd imagine that's not a good idea to repair and optimize the database at the same time, as it could cause corruption and possibly data loss to tables. The odd thing is, the crash only wiped out one table out of 84 tables in the forums' database. There was no data loss to any of the other tables.

Because the crash happened right before the automated backup at 3:30am, I had to recover the database with the backup generated 24 hours ago (The 2007-12-11 backup). That means, any posts made between 3:30am Dec. 11 and 3:30am Dec. 12 were lost. The exact loss were 479 posts, 10 topics, 12 members. With the help of GoogleBot, I was able to recover 35 posts, 3 topics from Google's temporary Internet cache, manually.

The maintenance scripts had been taken offline immediately for further investigation. However, the automatic backup script will still be operational. Backups will still be made every 24 hours at 3:30am every morning.


This concludes the database damage report happened on December 12, 2007. If you have any question regarding this incident, please PM or Email me. Thank you and sorry for the inconvenience once again.
Proud Contributor of the Music Section Revival Project