Monday, May 2, 2011

Meme Monday: I Got 99 SQL Problems And the Disk Ain’t One

This month, Thomas Larock (Website| Twitter ) declared a meme Monday inspired by the Hugo song 99 Problems - aside from disk issues, name 9 problems you frequently see in your shop which are not related to disk issues.

1) Using default install settings for file growth
Despite numerous examples from live systems showing that those settings are not appropriate for our product's databases, we frequently see new customers with all their databases set to the default 10% growth setting, despite the statement in the "best practices" documentation that tells them otherwise.

2) Bloated SQL error logs
Many times when customers report having issues and we're called in to examine what's happening with their SQL server, we find that we can't open the SQL Error Logs because the customer's SQL server hasn't been restarted in a long time and the errorlog is so bloated that it's too big for the UI to open in a timely manner. The simple fix, of course, is to set up a SQL job that runs sp_cycle_errorlog periodically.

3) Not doing ANY index maintenance
Frequently, when I hear about SQL performance issues, I find that the customer has turned off the regular index maintenance jobs "because they take too long". Eventually, this results in painfully out of date statistics, severely fragmented indices and terrible performance.

4) Shrinking databases as "maintenance" to "free up disk space"
I try my best not to use profanity or to scream (loudly) when I see this enabled on customer servers. I just take a deep breath and forward the following links to the guilty party:
Paul Randal: Why You Should Not Shrink your Data Files
Brent Ozar: Stop Shrinking Your Database Files. Seriously. Now.

5) Developers "testing" code on production
Don't get me started....


6) Poor backup plans not even close to SLA requirements
High volume OLTP Production database, full recovery with log backup once a day at midnight and full backup once a day at 1AM - but their SLAs say they have to completely recover from failure within one hour. They claim that because the SQL server is clustered, that they don't have to back up the databases more often. Really? Please don't call me when things go south.

7) No disaster recovery plan
... And office in the middle of Tornado alley. Again, please don't call me to resurrect your SQL server when your data center gets destroyed by one of the 500+ tornadoes that went through town. You don't have a copy elsewhere and I can't create something from nothing.

8) Letting idiots have access to the Server room
Believe me: I can't make this one up - it actually DID happen.
A particular person on the night cleaning crew entered a server room to vacuum it. Because the vacuum's cord was too short to allow him to vacuum the far side of the server room, he unplugged something to free up an outlet so he could vacuum the far corner of the server room. The "something" he unplugged was the main power for the SQL server, which knocked out the customer's website until someone entered the server room in the morning and noticed that the server was unplugged.

9) Not having automated monitoring on servers
You'd think this was obvious, but I've been called too many times to count late at night to hear that someone's server is "down", only to find out the reason the SQL server crashed or the backups failed was because the disk was full. Automated disk monitoring systems have been around for over a decade, yet many of our customers don't have any automated monitoring and I doubt that their IT people check the servers every day since they seem so surprised to discover that their disk has filled up completely.

After just thinking about those 9 items, it's time for a stress pill.

No comments:

Post a Comment

Resoved: Error Creating Storage Event Trigger in Azure Synapse

My client receives external files from a vendor and wants to ingest them into the Data Lake using the integration pipelines in Synapse (Syna...