Azure Data Lake Gen2 and SQL Server Integration Services

How it used to be

“Azure Data Lake Generation 2” (or ADLS gen2) is the newest cloud data lake offering from a Microsoft. It supposes to bring the best of two worlds together: excelent performance and redundancy of a blob storage and secure filesystem capabilities of a data lake.

However, for some time the ADLS gen2 had a lack of support by on-premise tools, like SQL Server and SSIS. It has a support of a Databricks and Azure Data Factory. But shops that are heavily relied on SSIS or Azure Data Lake Analytics (U-SQL) still have to stay with a previous generation of the service.

James Serra has an excelent article about limitations: Ways to access data in ADLS Gen2

New release of Azure Feature Pack

The situation changed with an August 2019 release of Azure Feature Pack for Integration Services (SSIS). There is also a blog post with anouncement: Azure Feature Pack 1.14.0 released with Azure Data Lake Storage Gen2 Support, however this release didn’t receive a wide spread and we didn’t hear loud fanfares.

Continue reading…

Automated Refresh of Databases on Development Servers

Databases that have to be copied from a production server

It happens that databases in testing and especially acceptance environments needed to be refreshed by restoring backups from a production environment. The backup restore is seems like a trivial operation that can be done via SQL Server Management Studio wizard or via prepared T-SQL scripts.However, there are some complications that come into play during this:

  • Multiple databases to be restored at once
  • The data refresh requires both: the latest FULL and the latest DIFF to be restored one by one.
  • The outdated databases still have established sleeping sessions, they all have to be terminated before firing a restore
  • And the most important, developers want to perform such refreshes in an automated way, without manual scripting and without walking through all knobs of the wizard

A stored procedure to have the routine automated

Because those listed above complications happens on a frequent basis I’ve created a stored procedure [dbo].[uspRefreshDB].

Continue reading…

How To Simplify Database Snapshots Creation in SQL Server

Database Snapshots is a powerful feature that can be used for quick reverts of the database to the state as it was in when a given database snapshot was created, as well as data and schema comparison between a source database and a snapshot.

While this feature was shipped in SQL Server 2005 it was available only in the enterprise edition. In release SQL Server 2016 SP1 Microsoft made a generous step and unlocked plenty of such enterprise-grade features. Since that snapshots become ready to use in all editions, including Express.

Warning: Database snapshots are dependent on the source database. Therefore, database snapshots are not to substitute your primary backup and restore strategy. Performing all your scheduled backups remains essential.

Problem and solution

While this feature has a clear T-SQL syntax SSMS has no wizard or GUI for it, so creation of multiple snapshots requires some tedious coding like:

CREATE DATABASE [AdventureWorks2016_snapshot] 
ON (
    NAME = [AdventureWorks2016_Data]
,   FILENAME ='H:\SQL_Data\Data\AdventureWorks2016_Data.mdf.snapshot'
AS SNAPSHOT OF [AdventureWorks2016];

Because of absence of UI and other helpers I created a stored procedure [dbo].[uspCreateSnapshot].

Continue reading…

Helpful SQL Server trace flags

SQL Server can be configured in various ways. It can be done via well-documented commands like ALTER SERVER or via system stored procedure SP_CONFIGURE. However, there is another way and sometimes it brings changes to behavior that cannot be achieved using other knobs – the name of it is a trace flag. Trace flags are special switches that can make an impact on the overall behavior of the data engine on a global scope or for some certain session.

Just an example

A famous error message: String or binary data would be truncated. It happens often and its content and meaning do not help that much in searching for the column or value that causes the error. An example:

CREATE TABLE Tempdb.dbo.Test (Col1 CHAR(2))
INSERT Tempdb.dbo.Test ( Col1 ) VALUES ('abc' )

--Msg 8152, Level 16, State 30, Line 6
--String or binary data would be truncated.

However, if trace flag 460 is enabled and SQL Server version is higher than SQL Server 2017 CU11 the error message going to be much different and helpful:

INSERT Tempdb.dbo.Test ( Col1 ) VALUES ('abc' )

--Msg 2628, Level 16, State 1, Line 2
--String or binary data would be truncated in table 'tempdb.dbo.Test', column 'Col1'. Truncated value: 'ab'.
Continue reading…


TRUNCATE TABLE removes all rows from a table, but the table structure and its columns, constraints, indexes, and so on remain. That is what the documentation says and that is what we daily use in our scripts, jobs and SSIS packages. Starting with SQL Server 2016 such operation can be done also on a partition level giving extra flexibility and convenience.

Directly to an example:

Partition truncation in SQL Server 2016 as simple as following line of code:

 --SQL Server Execution Times: CPU time = 0 ms,  elapsed time = 324 ms.
TRUNCATE TABLE dbo.PartitionedTable WITH (PARTITIONS (1, 6 TO 10));

The command has removed partitions: 1, 6, 7, 8, 9, 10. Each one held one million rows, therefore, 6 000 000 rows removed in a 324 ms. This is amazing result which outperform PARTITION SWITCHING timings.

Behind of the hood, TRUNCATE is a DDL command and this fact is a reason of superior performance of rows/pages release. However, it also leads to requirement of the minimum permission level on an object – ALTER TABLE.

Another necessary part of the command is the list of partition ids to remove corresponding segment. Partition Id can be obtained multiple ways. As example, via $PARTITION.partition function call. Following query has been executed before and after truncation:

	$PARTITION.[PartitioningByRowID] (RowId) AS PartitionID
,	COUNT(*) AS [RowCount]
,	MIN(RowId) AS MinRowID
,	MAX(RowId) AS MaxRowID
FROM dbo.PartitionedTable
GROUP BY $PARTITION.[PartitioningByRowID] (RowId)

Alternative way to get partition ids and metadata is by querying querying dynamic management object – sys.dm_db_partition_stats.

Continue reading…