Thursday, October 6, 2022

Resoved: Error Creating Storage Event Trigger in Azure Synapse

My client receives external files from a vendor and wants to ingest them into the Data Lake using the integration pipelines in Synapse (Synapse's version of Azure Data Factory). Since the exact time the vendor sends the files can vary greatly day-to-day, he requested that I create a Storage Event Trigger.

I quickly set up the trigger:


Set Type to "Storage events", then select the correct storage account and container from the list associated to the current Azure Subscription. Use the blob path begins with setting to filter on the correct folder where the files land and use the blob path ends with setting to filter the file types (and perhaps file names if it's relatively consistent) to ensure that only the right blobs invoke the trigger. Finally, set Event to "Blob created".

Click Continue to go to the next page.

There will be a warning, "Make sure you have specific filters. Configuring filters that are too broad can match a large number of files created/deleted and may significantly impact your cost." which reminds you to check that the filters on the previous page actually return only the desired set of files. Be sure that you have at least one qualifying file in the folder and that the Data Preview can find it. If not, go back to previous page and adjust the the blob path begins with setting and the blob path ends with setting to correct the filtering. 

Click Continue to go to the next page.

This final page asks for the pipeline parameters to use when the trigger is invoked. 

Click Save to create the trigger. 



Once the trigger has been saved, publish the data factory.

So far so good. 

Then this popped up: 


The trigger needs to create and subscribe to an Event Grid event in order to be activated. Even the error was mysterious: 

"The client {GUID}' with object id {GUID}' does not have authorization to perform action 'Microsoft.EventGrid/eventSubscriptions/write' over scope '/subscriptions/{GUID}/resourceGroups/{resourceGroup}/providers/Microsoft.Storage/storageAccounts/{StorageAccount}/providers/Microsoft.EventGrid/eventSubscriptions/{GUID}' or the scope is invalid. If access was recently granted, please refresh your credentials."

I tried numerous searches on how to get the authorization to perform action Microsoft.EventGrid/eventSubscriptions/write and kept hitting dead ends. 

Finally, I started poking around in the Subscription settings to see if something needed to be set in there. Under "Resource Providers", I found that Microsoft.Synapse, Microsoft.Storage, Microsoft.DataLakeStore and Microsoft.EventGrid were all registered. So that felt like a dead end. 

After a bit more muddling around searching, I entered "Failed to Subscribe" in the search and found my savior: Cathrine Wilhelmsen.  She had experienced exactly the same issue and had the same difficulty I had locating information on how to resolve issue. She even mentioned the same articles that I read in my attempts to figure out what to do! The only thing I had not done was visit the Microsoft Q&A thread about running event triggers in Synapse - probably because I stumbled upon her blog post first! Thank you, Cathrine!!

So what was the magic trick?


The Microsoft.DataFactory resource provider was not registered. 

I hadn't expected that because we didn't have Azure Data Factory installed in this subscription, but now we know that it is required for event triggers.

Once the Admin registered Microsoft.DataFactory, I was able to successfully publish the storage event trigger. 😀 


Thursday, May 12, 2022

3 Cloud Seeking Awesome Azure People

My company is hiring!!

3Cloud is a technology services firm helping clients transform their business through the power of the cloud. We leverage Microsoft Azure to help clients speed up innovation, operate more efficiently and be more responsive to their client’s needs. As a Microsoft partner, we specialize in Azure migration, cloud-scale custom application development, Internet of Things, analytics, and DevOps achieved through Azure-enabled infrastructure automation. 3Cloud is headquartered in Chicago with additional offices in Dallas and Tampa. 
Get your Azure on and join an awesome team with people like me. 


These are remote roles based in any US city.


Use the link below to see what roles are open for hiring. https://grnh.se/a6fd7f562us





Friday, March 25, 2022

Converting Views to Tables in SQL


Every once in a while, I have a data conversion project in which they want to move data to another server and change a number of legacy views into tables. This is most common when the views use data from linked servers and we have to migrate everything over to Azure SQL Database which doesn't allow cross database queries. 

One could manually write the CREATE TABLE scripts for each, but that's a lot of work. We can leverage the system tables to help us create the scripts from the views. 

Here's my rendition of the script: 

SELECT CONCAT('CREATE TABLE [',s.name,'].[',v.name,']
(',STRING_AGG(CONCAT (
   '[',c.name,'] ',t.name
   ,CASE WHEN t.name = 'varchar' and c.max_length = -1 
         THEN '(max)'
         WHEN t.name IN ('varbinary','varchar')
         THEN CONCAT ('(',convert(VARCHAR(50), c.max_length),')')
 WHEN t.name IN ('decimal','numeric')
 THEN CONCAT ('(',convert(VARCHAR(50), c.precision),',',convert(VARCHAR(50), c.scale),')')
WHEN t.name = 'datetime2'
THEN CONCAT ('(',convert(VARCHAR(50), c.scale),')')
ELSE ''
END
,' '
,CASE WHEN c.is_nullable = 1
      THEN 'NULL'
      ELSE 'NOT NULL'
END), ',') WITHIN GROUP (ORDER BY c.column_id),')
GO ') AS tablename
FROM sys.all_columns c
JOIN sys.types t ON c.system_type_id = t.system_type_id
JOIN sys.VIEWS v ON c.object_id = v.object_id
JOIN sys.schemas s ON v.schema_id = s.schema_id
WHERE v.name IN ('myView1','myView2')  ---put your view names here
GROUP BY s.name,v.name
ORDER BY s.name,v.name

The advantage of my script is that it aggregates all of the text for each table in a single row in the results. The STRING_AGG gracefully handles skipping the comma after the last column, so nothing has to be deleted from the script when copying it to a query window to run, and the CREATE TABLE statement is formatted pretty close to they way most people format their table scripts. I also included the square brackets around the Schema Name, Table Name, and Column Name in  case the legacy views used reserved words in any of those places.  

I'm sure this will come in handy in the future since this is the second project where I've needed this script and I forgot to save it last time. 


Friday, November 12, 2021

Databricks and Synapse Study Guide

 



We ran out of time in my Q&A session at the PASS Data Community Summit and I wasn't able to answer the question, "Are there any books or materials that you recommend for Databricks and Synapse?". I promised to blog the list, so here it is. 

Synapse

  • Once you create an Azure Synapse Workspace, you will see a link that send you directly to the Microsoft documentation on using Synapse. The Microsoft documentation is good. It includes several tutorials and Quick Starts to help you hit the ground running. 
  • Once you enter the main screen of Synapse, there's a like to "Knowledge Center", which includes code samples for exploring data with Spark, querying data with SQL and creating external tables using SQL. The Knowledge Center also has a gallery of database templates, sample datasets, sample notebooks, sample SQL scripts and sample pipelines. Definitely check it out and try a few of them. 

Databricks

  • The official Databricks Documentation has the latest and best information on everything Databricks
  • If you are using Azure Databricks, be sure to refer to Microsoft's Azure Databricks documentation because that covers Databricks coding specific to Azure instances including connecting to other Azure services and using Azure's version of the Databricks interface.
  • The official Apache Spark Documentation contains documentation for all versions of Spark, and has good code examples to help you determine how to use each function available in each language used in Spark. 
  • Databricks Academy -yes, they have their own training, which includes Azure and AWS specific training as well as general Databricks training which is applicable to any Databricks implementation. Some is free, and some is paid. 
  • Databricks offers periodic webinars. Definitely sign up for those and the Databricks "Data and AI Summit".  
  • Pop on over to YouTube and look up Bryan Cafferky. He has several good videos on Databricks.

Both Synapse AND Databricks

Yes, there are a few bits of documentation which apply to both Synapse and Databricks.
  • I highly recommend exploring Microsoft Learn. It has free training on everything in Azure and many of the training modules include labs which walk you through performing various activities in Azure using their Azure lab account so you don't have to pay for the experience of learning how to use certain features. 
  • Pragmatic Works still offers weekly free webinars on various topics including Azure, SQL server, Power Apps and more. Sign up on their list to be notified of new content, and explore their archive of past webinars.
  • If you're a fan of watching technical videos, "Advancing Analytics" has a series of YouTube videos covering Azure Synapse and another series covering Databricks. He posts new videos periodically, so keep checking back on his page.
  • 3 Cloud Solutions also offers periodic free webinars on various Azure Topics. Sign up for notifications from them so you don't miss any or explore their list of past webinars.


If I find any additional resources, I'll add them later, but that's all I have for now. 

Thursday, November 11, 2021

CI/CD with Azure Synapse Notebooks - Error Resolved

 


Some features of Azure Synapse are mysterious. Recently, I was working on deploying Azure Synapse artifacts from development to production using the "Synapse Workspace Deployment" extension in Azure DevOps and received an odd error: 

2021-11-10T21:20:14.8670075Z For artifact: AzureSQLQueryTool: Checkstatus: 202; status message: Accepted
2021-11-10T21:20:44.9656242Z For artifact: AzureSQLQueryTool: Checkstatus: 200; status message: OK
2021-11-10T21:20:44.9661205Z For artifact: AzureSQLQueryTool: Artifact Deployment status: Failed
2021-11-10T21:20:44.9673543Z Error during execution: Error: Failed to fetch the deployment status {"code":"400","message":"Failed Component = DataFactoryResourceProvider, ErrorCode = 400, Error = BadRequest "}
2021-11-10T21:20:44.9723399Z ##[error]Encountered with exception:Error: Failed to fetch the deployment status {"code":"400","message":"Failed Component = DataFactoryResourceProvider, ErrorCode = 400, Error = BadRequest "}
2021-11-10T21:20:44.9945300Z ##[section]Finishing: Synpase deployment task for workspace: myWorkspace_prod

The new items I had added to Synapse were several spark notebooks for ingesting data. I had tested them individually and they all appeared to be working, yet Azure DevOps' CI/CD gave me error when it attempted to deploy the release to production. I had followed the instructions provided by Microsoft to set up the CI/CD pipeline, yet it was failing.

I attempted to add override parameters for the notebooks - each notebook was linked to the spark pool in dev which was named "sp_dev". The Production spark pool was called "sp_prod", so with parameters for the pool's name it should work, right? 

No. Same error. 

After numerous other unsuccessful attempts at deployment, I deleted the production spark pool and recreated it with the same name as the dev spark pool. The notebooks deployed without a hitch. 

If you see the above error messages in your CI/CD logs and have spark notebooks in your Synapse deployment, the fix is always give the same names to the spark pools in every environment


Friday, November 5, 2021

Data Community Fun: A visit with Paul Turley

 


A few weeks ago, Paul Turley, his wife Shirley, their two spunky dogs and anti-social cat pulled up at the "Letourneau Resort" for a 3-day visit. We spent some time working together on the patio while his dogs raced around my property. The cat hid in the Data Bus and only went out when forced to do so. We shared several meals and spent many hours working at our computers (he visited Thursday, Friday, & Saturday. We both had to work Thursday & Friday). 



"Fancy" patio set-up. Just the computer, but we do have internet out there and the weather was perfect for sitting outside enjoying the fresh air while also getting a lot of work done. 

View from patio when you look up from computer. :) 

On the last day, Saturday, my boss at 3 Cloud, Kathi Vick, and her husband joined us. Paul has a blog called "Data On The Road", so he interviewed both of us before getting back on the road heading eastward. 


Guy loaned Paul his "Director's clapper" - it was the award from the Sakuracon 2005 AMV competition and was signed by a bunch of the convention guests - for the filming. The interviews took place on my patio couch. The clapper was definitely a fun way to start the videos.


I got to put the "Arizona" sticker on their "Places We've Been" map attached to the side of the Data Bus.

In early 2020, Paul and I had planned on being co-presenters at the Phoenix SQL Saturday, so I made cloud shirts for both of us so we'd match. Unfortunately, COVID came along and the Phoenix SQL Saturday never happened. 

For the interview with me, Paul and I donned our matching shirts to demonstrate how cool it would have been if we had presented with matching cloud shirts. They look pretty spiffy, eh? And the cloud earrings are of course a necessary accessory for any cloud data professional!


The above is a screenshot of the video posted to Paul's blog. I recommend you go see the video on his page, and also check out the other interviews he did with a number of other wonderful people in the data community. He also has a schedule posted telling you where he'll be and when. If he's coming to your area, reach out to him and say, "hi". 
 
I can't wait to see the Data Bus again when it next passes through the Phoenix area. 

Tuesday, October 26, 2021

I'm Speaking at the PASS Data Community Summit Nov 8-12, 2021!

 


If you haven't heard, the PASS Data Community Summit is coming up Nov 8-12, 2021. It's a free conference hosted by Redgate Software. The Summit is the largest gathering of Microsoft Data Platform professionals. Speakers will cover data management, analytics, architecture, and professional development. 

The full session list is available here.


My session is "Hitting The Bricks: An Introduction to Databricks". It is intended for people who have never used Databricks but would like to learn about it to see if it's something that they should add to their systems. Since it's a pre-recorded session, you can watch it at any time during the PASS Summit. Since I am scheduled for a live 30-minute Q&A session related to the session on November 11th at 5 PM Eastern Time, I suggest you watch the session prior to that time so you can come to the Q&A session with questions. The Powerpoint and demo notebook will be available on the PASS Data Community Summit page around that time as well. 

Register here! and see you at the Summit.

Resoved: Error Creating Storage Event Trigger in Azure Synapse

My client receives external files from a vendor and wants to ingest them into the Data Lake using the integration pipelines in Synapse (Syna...