Wednesday, February 29, 2012

Adventures in SSIS: Data Migration

After escaping from technical support to the “greener fields” of Product Engineering, I thought that I would finally be in my element and have completely stress-free workdays.
So untrue.
So painfully untrue.IMG-20120130-00016
The new position offered enough stress to make event the most stable person ready to fling primary colored cartoon-like stuffed birds at an equally cartoon-like stuffed pig (who, for some unexplained reason, is lime green rather than pink, and seems to be just a head). Not that I do that frequently, but these poor birds have seen better days, as their ruffled feathers can attest. I suppose I could try to take my stress out on other things, but HR gets a wee bit upset if I throw chairs or coworkers (drat).Fortunately, they don’t seem to mind the occasional Angry Bird™ flying through the cube farm, however, so I fling away.
What is the current stressor?
SSIS.
I must admit that up until now I haven’t touched SSIS at all. I have plenty of experience working with DTS packages, but none with SSIS.
Until now.
There is no easy DTS to SSIS knowledge conversion. They’re completely different animals. They may do similar things, but nowhere near in the same manner. Books Online offers some information, but if you don’t already know SSIS the documentation might as well be in Klingon since it assumes you fully understand everything and are just using the books online for a quick definition check.invisible_woman1
Great. And it’s due by the end of February. Which, by the way was yesterday.
Thank GOD it’s a leap year, otherwise I would have had to ensure I learned invisibility instantly.
Some of the sections of the package I was able to copy from other migration packages I found in our Source Control (it’s Tortoise Subversion, BTW), but the rest had to be created by me, and quickly.
Some things I could figure out: get data from Source, compare to Destination using LookUp, then Insert New unmatched data into Destination. Works great unless you have some wiseguy who customizes the data and causes a primary key violation between Source and Destination. Simply inserting a new row without keeping the key value didn’t work because the Destination table does not have identity insert (aka “autoNumber”) on the key. I searched everywhere for ideas—checking  “SSIS Upsert” and equivalents in Bing and Goolgle—to see if anyone had written code that could help me. Most articles assumed that the destination had identity insert turned on.
Thankfully, I had the #SQLHelp hastag on Twitter and received responses from some awesome people* who were able to provide me the hints / guidance that I needed to plow through this disaster waiting to happen. The saving tweet (thanks @EricWisdahl) contained a link to an extremely helpful article by Phil Brammer called, “Generating Surrogate Keys”. It was exactly what I needed. 
Well, mostly….
SSIS is extremely fussy about certain things—C# coding is case sensitive; datatypes must match exactly; math on certain numbers mysteriously changes the datatype. Things like that.
The issue I had was that the Source table use int for its KeyID, and the Destination table used tinyInt. Most searches for an equivalent in SSIs won’t tell you the full story… but I finally found it.
First, several sources tell you that
tinyInt is equivalent to DT_UI1 (single byte un-signed integer),
but if you check the list of datatypes for variables, you’ll note that it’s missing (or if not missing, then named something that is NOT intuitively equal. What does DT_UI1 mean anyway?).
After many false starts (including learning that the C# scripting is painfully case sensitive), I discovered that
the Byte data type works for tinyInt Variables.
The other thing I found was that in SSIS scripts, when you add something to a value, it automatically forces a conversion to int datatype—and it’s very difficult to undo that. What I did was use Eric’s script unchanged, except to the final statement
Row.ClientKey = NextKey;
I added a conversion to byte:
Row.ClientKey = (byte)NextKey;

and it worked.

Now stress is greatly reduced and my Angry Birds and Green Pig are happily noshing on my cereal.



* HUGS! to @onpnt and @EricWisdahl for being “first responders”

Monday, January 16, 2012

January #Meme15 Assignment

<*Cough* *Cough*> It’s awfully dusty here, isn’t it? Yeah, I’ve been neglecting my blog. Terribly sorry about that…. Well, nothing like trying to get things going again at the start of a new year. Hopefully this year I’ll be more consistent about posting (and not quit half-way through the year)

#Meme15 Assignment #2

The #Meme15 is a meme started by a group of people in the #SQLFamily who wanted tomeme15new discuss how they use Social Networks to enhance their careers and professional development.
The assignment for this month was posted on Jason Strate’s blog – talk about Twitter, answering “Why should average Jane or Joe professional consider using Twitter?” and “What benefit have you seen in your career because of Twitter?“
Let’s get started.
 

Why should average Jane or Joe professional consider using Twitter?

From the Oatmeal - click photo to go to sourceThat’s exactly what I wondered when I first heard about Twitter – why bother slogging through countless random postings about useless things written by strangers who have too much spare time on their hands? I really don’t need to know that you’re taking your goat for a walk or that you ate sushi last night. Besides, I likely already saw your post on Facebook, Linked In and Google+ on exactly the same thing. Sounds like a major time waster, right?
If that’s all there was to it, then it would probably have gone the way of the 8-track tape within a few months. But thankfully, following people on Twitter can offer far greater benefits, as I discovered at the 2009 SQL PASS Summit conference. About halfway though the first day, I found out that the majority of the SQL people I really admired were all using Twitter as their main means of keeping in touch with other SQL professionals. And they weren’t tweeting useless stuff – they were posting announcements of new blog posts, links to articles about SQL, free online training, and other SQL-related items.
For SQL server professionals, Twitter definitely has benefits – just follow all of the awesome SQL gurus and the #SQLHelp and #SQLPass hashtags. For other professionals, it may or may not be helpful – it all depends upon whether other professions have a significant number of people tweeting about their profession.

What benefit have you seen in your career because of Twitter?

I’ve used the #SQLHelp hashtag several times to ask SQL-related questions and have received answers so quickly from SQL experts that it felt like they were right there with me helping me along.
From Twitter, I’ve also been able to find out about free online webinars and more SQL articles and blogs than what I have time to read in a day. Without Twitter, it would likely have taken me far longer to find the same information – or I would’ve completely missed seeing the information at all.
Finally, the most important benefit of chatting on Twitter with all of these SQL professionals is that when I attend a SQL conference these people actually know me by name – which has made networking so much easier.

Friday, July 29, 2011

Are Your Database Statistics Fresh?


SQL maintains a vast amount of data – or statistics – about the content of each object in a database. The statistics can become stale if they have not been updated very often, or if a large number of changes have occurred within the database. As the statistics become less useful, the time for running queries can increase dramatically.
In Production systems, statistics should be updated during the usual maintenance window to ensure that the metadata is fresh.
To see how fresh the statistics are for one object, run:
DBCC SHOW_STATISTICS ( 'CCAuditSessionType' ,CCAuditSessionType_PK)

If you need to see the statistics for all databases, run this instead:
select STATS_DATE(o.object_id,s.stats_id) as StatsDate,o.name as TableName, s.name as StatsName, auto_created, user_created, no_recompute
from sys.stats s
join sys.objects o on s.object_id=o.object_id
where o.type='U'

To update statistics that are out of date, execute the command
exec sp_updatestats
on each database on the server that needs to have its statistics updated.
Use the following query to generate a script to update the statistics on all databases
declare @db varchar(30)
, @dbID int
, @sql varchar(max)

create table #t
(DbName varchar(30), databaseID int)

Insert #t (DbName, databaseID)
select [name], database_id
from sys.databases
where database_id > 4

Select @dbID = MIN(databaseID)
from #t

While @dbID is not NULL
BEGIN
   select @db=DbName
     from #t
    where databaseID=@dbID

   set @sql = 'Use [' + @db + ']' + CHAR(13) + 'go ' + CHAR(13)
   set @sql = @sql + 'exec sp_updatestats' + CHAR(13) + 'go '

   PRINT @sql

   Select @dbID=min(databaseID)
     from #t
    where databaseID>@dbID
END

drop table #t
Copy and paste the printed output from your result set into the query portion of a SQL Server Agent job and this will ensure that the statistics are updated for all databases on a regular schedule. NOTE: the query above excludes the system databases.

Thursday, July 14, 2011

Adventures with Denali CTP3–Part 1

I usually only realize how slow downloads can be when I’m eager to begin working with the item being downloaded. The hour it took to download the AdventureWorks sample databases felt far longer than it actually was.
One thing that surprised me was that the downloads for the databases were only the MDF (data) file – the log file was not included. After fiddling unsuccessfully with attaching it using the UI in Management Studio (no, I didn’t think of deleting the log file name from the file list in the UI – I’d assumed it was required and didn’t realize that if you did not list a logfile that it automatically treated it as an ATTACH_REBUILD_LOG command), I finally decided that it would be sensible to actually read the instructions. Technet provided me a very simple query to attach the database
CREATE DATABASE AdventureWorks2008R2 ON (FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\AdventureWorks2008R2_Data.mdf') FOR ATTACH_REBUILD_LOG ;
Worked like a charm.
I modified the query and attached the AdventureWorksDWDenali database in a similar manner then ran a few quick SELECT queries on various tables to see what they contained. I was pleasantly surprised to see that the OLAP database’s dimDate table contained English, French and Spanish day and month names.
I then launched the BI Development Studio and opened the AdventureWorksDWMultidimensionalDenali project (provided in samples pages on Codeplex). I verified the connection information in the datasource and successfully deployed the cube.
If everyone knew how easy this was, I’d probably be out of a job.

Query to Pull Database File Information

This query will list the name, size and location of all files for all databases. This is handy for checking and documenting database server configuration to confirm whether the server follows our recommended best practices.
set nocount on
declare @sql varchar(max), @sql2 varchar(max), @name varchar(100)
if object_id('tempdb..#t') is not null drop table #t
create table #t (DbName varchar(30), LogicalName varchar(30), FileName varchar(100), sizeMB int, UsedSpaceMB int, growthMB int, is_percent_growth bit)
set @sql = ' substring(name,1,30) as LogicalName,substring(physical_name,1,75) as FileName,'
set @sql = @sql + 'size * 8192./1024/1024 as SizeMB,sum(a.total_pages * 8192./1024/1024 ) as UsedSpaceMB, '
set @sql = @sql + 'growth * 8192./1024/1024 as growthMB, is_percent_growth from '
declare c cursor for
select name from sys.databases where database_id > 4
open c
fetch next from c into @name
while @@fetch_status=0begin
set @sql2 = @sql + @name + '.sys.database_files df left join '
set @sql2 = @sql2 + @name + '.sys.allocation_units a on df.data_space_id=a.data_space_id left join '
set @sql2 = @sql2 + @name + '.sys.partitions p on p.partition_id = a.container_id '
set @sql2 = @sql2 + 'group by df.name, df.physical_name, growth, is_percent_growth, df.size'
begin try exec ('insert #t select ''' + @name + ''' as DbName, ' + @sql2 ) end try begin catch end catch
fetch next from c into @name
end
close c
deallocate c
select * from #t


I’m sure that there are other ways to pull this data, however, in some environments your permissions may restrict you from using any method other than this to pull the data.

Tuesday, July 12, 2011

SQL Recovery Mode Adjustment

Often when new Development boxes are handed over to my group, the databases are set to Recovery Mode = FULL because the setting match production recovery modes. Unfortunately, since the Dev boxes rarely have any backups running, eventually the transaction logs fill up the drive. When that happens, the databases can no longer accept new transactions and we are left with a (temporarily) non-functional box.


Run this script on a Dev box to set the recovery to SIMPLE for all databases to avoid the above scenario. It works on SQL 2000, 2005 and 2008 SQL servers.


NOTE: it is recommended that PRODUCTION servers use FULL recovery mode rather than SIMPLE.


use master

go


DECLARE @Database VARCHAR(255)
DECLARE @Table VARCHAR(255)
DECLARE @cmd NVARCHAR(500)
DECLARE DatabaseCursor CURSOR FOR


SELECT name FROM master.dbo.sysdatabases
where Name not in ('tempdb') -- cannot set recovery for Tempdb.
ORDER BY 1


OPEN DatabaseCursor


FETCH NEXT FROM DatabaseCursor INTO @Database
WHILE @@FETCH_STATUS = 0
BEGIN


     set @cmd = 'ALTER DATABASE ' + @database +'
     SET RECOVERY SIMPLE'


     EXECUTE sp_executesql @statement=@cmd


     FETCH NEXT FROM DatabaseCursor INTO @Database


END


CLOSE DatabaseCursor
DEALLOCATE DatabaseCursor


For your homework, you can substitute a WHILE loop for the cursor.  
If your environment is running mostly SQL 2008 (or higher), please check out SQLChicken's article on setting up Policy Based management to handle ensuring that the dev boxes are all set to Simple Recovery Mode.

Monday, May 2, 2011

Meme Monday: I Got 99 SQL Problems And the Disk Ain’t One

This month, Thomas Larock (Website| Twitter ) declared a meme Monday inspired by the Hugo song 99 Problems - aside from disk issues, name 9 problems you frequently see in your shop which are not related to disk issues.

1) Using default install settings for file growth
Despite numerous examples from live systems showing that those settings are not appropriate for our product's databases, we frequently see new customers with all their databases set to the default 10% growth setting, despite the statement in the "best practices" documentation that tells them otherwise.

2) Bloated SQL error logs
Many times when customers report having issues and we're called in to examine what's happening with their SQL server, we find that we can't open the SQL Error Logs because the customer's SQL server hasn't been restarted in a long time and the errorlog is so bloated that it's too big for the UI to open in a timely manner. The simple fix, of course, is to set up a SQL job that runs sp_cycle_errorlog periodically.

3) Not doing ANY index maintenance
Frequently, when I hear about SQL performance issues, I find that the customer has turned off the regular index maintenance jobs "because they take too long". Eventually, this results in painfully out of date statistics, severely fragmented indices and terrible performance.

4) Shrinking databases as "maintenance" to "free up disk space"
I try my best not to use profanity or to scream (loudly) when I see this enabled on customer servers. I just take a deep breath and forward the following links to the guilty party:
Paul Randal: Why You Should Not Shrink your Data Files
Brent Ozar: Stop Shrinking Your Database Files. Seriously. Now.

5) Developers "testing" code on production
Don't get me started....


6) Poor backup plans not even close to SLA requirements
High volume OLTP Production database, full recovery with log backup once a day at midnight and full backup once a day at 1AM - but their SLAs say they have to completely recover from failure within one hour. They claim that because the SQL server is clustered, that they don't have to back up the databases more often. Really? Please don't call me when things go south.

7) No disaster recovery plan
... And office in the middle of Tornado alley. Again, please don't call me to resurrect your SQL server when your data center gets destroyed by one of the 500+ tornadoes that went through town. You don't have a copy elsewhere and I can't create something from nothing.

8) Letting idiots have access to the Server room
Believe me: I can't make this one up - it actually DID happen.
A particular person on the night cleaning crew entered a server room to vacuum it. Because the vacuum's cord was too short to allow him to vacuum the far side of the server room, he unplugged something to free up an outlet so he could vacuum the far corner of the server room. The "something" he unplugged was the main power for the SQL server, which knocked out the customer's website until someone entered the server room in the morning and noticed that the server was unplugged.

9) Not having automated monitoring on servers
You'd think this was obvious, but I've been called too many times to count late at night to hear that someone's server is "down", only to find out the reason the SQL server crashed or the backups failed was because the disk was full. Automated disk monitoring systems have been around for over a decade, yet many of our customers don't have any automated monitoring and I doubt that their IT people check the servers every day since they seem so surprised to discover that their disk has filled up completely.

After just thinking about those 9 items, it's time for a stress pill.

Resoved: Error Creating Storage Event Trigger in Azure Synapse

My client receives external files from a vendor and wants to ingest them into the Data Lake using the integration pipelines in Synapse (Syna...