Azure Data Factory and REST APIs – Dealing with oauth2 authentication

In this first post I am going to discuss how to apply oauth2 authentication to ingest REST APIs data.

OAUTH2 became a standard de facto in cloud and SaaS services, it used widely by Twitter, Microsoft Azure, Amazon. My example is based on EXACT Online API. The end-target of the blog series is to setup an entire pipeline which will ingest data from a REST API and load it to a data lake.

The structure of a pipeline

The simplest form of a pipeline contains two activities:

  • Web Activity: Performs a POST call get an authentication token
  • Copy Activity: Fetches data from API and load it to a destination linked service
Step 0

In this post we will focus on implementing the first activity – Web Activity which has to do small but important part of the work – authenticate and get access token.

Continue reading…

Azure Data Lake Gen2 and SQL Server Integration Services

How it used to be

“Azure Data Lake Generation 2” (or ADLS gen2) is the newest cloud data lake offering from a Microsoft. It supposes to bring the best of two worlds together: excelent performance and redundancy of a blob storage and secure filesystem capabilities of a data lake.

However, for some time the ADLS gen2 had a lack of support by on-premise tools, like SQL Server and SSIS. It has a support of a Databricks and Azure Data Factory. But shops that are heavily relied on SSIS or Azure Data Lake Analytics (U-SQL) still have to stay with a previous generation of the service.

James Serra has an excelent article about limitations: Ways to access data in ADLS Gen2

New release of Azure Feature Pack

The situation changed with an August 2019 release of Azure Feature Pack for Integration Services (SSIS). There is also a blog post with anouncement: Azure Feature Pack 1.14.0 released with Azure Data Lake Storage Gen2 Support, however this release didn’t receive a wide spread and we didn’t hear loud fanfares.

Continue reading…

Automated Refresh of Databases on Development Servers

Databases that have to be copied from a production server

It happens that databases in testing and especially acceptance environments needed to be refreshed by restoring backups from a production environment. The backup restore is seems like a trivial operation that can be done via SQL Server Management Studio wizard or via prepared T-SQL scripts.However, there are some complications that come into play during this:

  • Multiple databases to be restored at once
  • The data refresh requires both: the latest FULL and the latest DIFF to be restored one by one.
  • The outdated databases still have established sleeping sessions, they all have to be terminated before firing a restore
  • And the most important, developers want to perform such refreshes in an automated way, without manual scripting and without walking through all knobs of the wizard

A stored procedure to have the routine automated

Because those listed above complications happens on a frequent basis I’ve created a stored procedure [dbo].[uspRefreshDB].

Continue reading…

How To Simplify Database Snapshots Creation in SQL Server

Database Snapshots is a powerful feature that can be used for quick reverts of the database to the state as it was in when a given database snapshot was created, as well as data and schema comparison between a source database and a snapshot.

While this feature was shipped in SQL Server 2005 it was available only in the enterprise edition. In release SQL Server 2016 SP1 Microsoft made a generous step and unlocked plenty of such enterprise-grade features. Since that snapshots become ready to use in all editions, including Express.

Warning: Database snapshots are dependent on the source database. Therefore, database snapshots are not to substitute your primary backup and restore strategy. Performing all your scheduled backups remains essential.

Problem and solution

While this feature has a clear T-SQL syntax SSMS has no wizard or GUI for it, so creation of multiple snapshots requires some tedious coding like:

CREATE DATABASE [AdventureWorks2016_snapshot] 
ON (
    NAME = [AdventureWorks2016_Data]
,   FILENAME ='H:\SQL_Data\Data\AdventureWorks2016_Data.mdf.snapshot'
) 
AS SNAPSHOT OF [AdventureWorks2016];

Because of absence of UI and other helpers I created a stored procedure [dbo].[uspCreateSnapshot].

Continue reading…

Helpful SQL Server trace flags

SQL Server can be configured in various ways. It can be done via well-documented commands like ALTER SERVER or via system stored procedure SP_CONFIGURE. However, there is another way and sometimes it brings changes to behavior that cannot be achieved using other knobs – the name of it is a trace flag. Trace flags are special switches that can make an impact on the overall behavior of the data engine on a global scope or for some certain session.

Just an example

A famous error message: String or binary data would be truncated. It happens often and its content and meaning do not help that much in searching for the column or value that causes the error. An example:

CREATE TABLE Tempdb.dbo.Test (Col1 CHAR(2))
INSERT Tempdb.dbo.Test ( Col1 ) VALUES ('abc' )

--Msg 8152, Level 16, State 30, Line 6
--String or binary data would be truncated.

However, if trace flag 460 is enabled and SQL Server version is higher than SQL Server 2017 CU11 the error message going to be much different and helpful:

DBCC TRACEON(460,-1)
INSERT Tempdb.dbo.Test ( Col1 ) VALUES ('abc' )

--Msg 2628, Level 16, State 1, Line 2
--String or binary data would be truncated in table 'tempdb.dbo.Test', column 'Col1'. Truncated value: 'ab'.
Continue reading…