Thursday, December 2, 2010

Corrupt Database Adventures

Ever notice how a “quick question” can turn into a major project? A coworker IM’d me earlier today with a “quick question” asking whether I knew how to get a database out of Emergency Mode.

Uh oh…

She then reassured me that it was a DEV box and not production. Whew! That meant that we didn’t have to worry about losing important customer data and at worst we could restore from back up without too much pain. I decided to investigate since it’s not often that one gets to fiddle with a genuinely corrupt database that’s not critical to the bottom line.

I did a quick web search and found an article by Paul Randal that talked about the last resorts that people try first with a corrupted database. It mentioned that one thing people try is rebuilding the transaction log  using  DBCC REBUILD_LOG. I tried it, but it’s not a valid DBCC command for SQL 2005 (in fact, if I’d read a few more paragraphs in Paul’s article BEFORE trying it, I would’ve seen him mention that very fact – that should teach me the folly of skimming articles).

The next item mentioned was to perform a DBCC CHECKDB using the REPAIR_ALLOW_DATA_LOSS setting. What better time to try this than when you have a broken dev box to play with? Take a deep breath and see what happens.

DBCC CHECKDB (testdb, repair_allow_data_loss) WITH NO_INFOMSGS;

The results said:

Msg 605, Level 12, State 3, Line 1

Attempt to fetch logical page (1:2576) in database 5 failed. It belongs to allocation unit 72057594060406784 not to 281474980642816.

Msg 8921, Level 16, State 1, Line 1

Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

That didn’t sound promising.I tweeted #sqlhelp quoting the text of the Msg 605 to see if anyone else had any suggestions. Meanwhile, I searched for other possible tricks to try.

One posting I found on a SQL newsgroup (forgot which one and can’t find it again) stated that Msg 605 indicates a physical drive failure. I didn’t verify that, but since the dev box had been troublesome for quite some time, it didn’t sound too inaccurate.

Meanwhile, I got some suggestions from #sqlhelp to try DBCC DBREINDEX and DBCC CHECKTABLE on all the indexes and tables. I snagged a script from SQLServerCentral that loops through every table and performs a DBCC DBREINDEX on all indexes. Lots of black text scrolled through in the results pane, interspersed with some shocks of red. The red means something was wrong – but at least the black text following the red text indicated that things were repaired.

I then ran the DBCC CHECKTABLE using the same script as above to loop through all tables. It crunched through the tables happily for about 75% of the database, then it stopped with an error:

Msg 211, Level 23, State 51, Line 13

Possible schema corruption. Run DBCC CHECKCATALOG

Sweeeeeeeeeeeeeeeeeeeeet. An “Improvement”? No. I ran DBCC CHECKCATALOG as suggested. It completed successfully. THAT seemed promising. I tried rerunning

DBCC CHECKDB (testdb, repair_allow_data_loss) WITH NO_INFOMSGS;

Same results. One more time with gusto – this time I noticed that I’d lost the SQL connection. Reconnect, retry, and it disconnected before completing the command. The server had degraded sufficiently that there certainly was no hope of resuscitating it, so I logged off. A short while later, I received a message stating that they’d located the backup image of the dev box and were ready to rebuild it.

Although I spent several hours working on this, it was a great learning experience for me – most importantly, I learned that if DBCC CHECKDB can’t repair it, then likely you’re best off restoring the database from backup.

Before you find yourself in a situation with a corrupt database, take a look at Paul Randal’s “CHECKDB from Every Angle: Emergency Mode Repair”, which is an update to his article mentioned above. I wish I’d found that one first, but one uses the tools that the search engines return. Actually, the better choice would be to simply read through Paul Randal’s “Corruption” Category in his blog – this is the ever-growing set of articles that he’s written about database corruption. I’m going to spend a lot of time reading through those posts, and you should, too. It’ll certainly make you life easier when someone asks you a “quick question” on how do you fix a database in Emergency mode. 

Resoved: Error Creating Storage Event Trigger in Azure Synapse

My client receives external files from a vendor and wants to ingest them into the Data Lake using the integration pipelines in Synapse (Syna...