SQL Server 2016: Temporal Tables. Changing the Schema
In the previous blog post, I made an introduction to the basic concepts behind temporal tables. This time I would like to cover, in a little bit more detailed way, one difficult topic – schema change: how it can be applied to temporal tables and what impact on the system it can bring.
Foreword
This post contains performance measurements of certain DDL commands. However, the absolute value of such metrics makes not so much sense due to the no-reproducibility of execution environment on the reader side. Therefore, I will provide command execution time not just for schema change operations, but also for data load, index creation, data manipulation commands, so all timespans can be matched with each other to make overall understanding about the impact.
When it comes to characteristics of the execution environment, it is Hyper-V guest with 4 cores, 4 GB RAM on HP ZBook /W8.1 /Core i7:
Performance measurements of the commands done by using:
SET STATISTICS TIME ON
Base table data
As the data sample, Product table from AdventureWorks2014 used, but with some additions:
- Added extra filler – a column with a type CHAR(1000) to stretch rows and make a bigger load on the SQL Server.
- The dataset enlarged to 1 000 000 rows via CROSS JOIN. This is Adam Machanic’s script.
-- SQL Server Execution Times: -- CPU time = 3 455 ms, elapsed time = 32 109 ms. SELECT TOP 1000000 ISNULL(p.ProductID + (a.number * 1000),0) AS ProductID, p.Name + CONVERT(VARCHAR, (a.number * 1000)) AS Name, p.ProductNumber + '-' + CONVERT(VARCHAR, (a.number * 1000)) AS ProductNumber, p.MakeFlag, p.FinishedGoodsFlag, p.Color, p.SafetyStockLevel, p.ReorderPoint, p.StandardCost, p.ListPrice, p.Size, p.SizeUnitMeasureCode, p.WeightUnitMeasureCode, p.Weight, p.DaysToManufacture, p.ProductLine, p.Class, p.Style, p.ProductSubcategoryID, p.ProductModelID, p.SellStartDate, p.SellEndDate, p.DiscontinuedDate, cast('' as char(1000)) as DataFiller -- Dummy column to expand size of the table INTO SQL2016Demo.dbo.Product FROM AdventureWorks2014.Production.Product AS p CROSS JOIN master..spt_values AS a WHERE a.type = 'p'
Note that CPU time is just 1/10 of overall elapsed time. This is a sign that such operation is IO bound and such behavior is expected since virtual machine placed on a single, separate, but still non-SSD hard drive.
Temporal tables have one schema requirement: the base table should have a PRIMARY KEY defined, according to documentation:
A temporal table must have a primary key defined to correlate records between the current table and the history table, and the history table cannot have a primary key defined.
So, another very IO-bound query to run:
-- SQL Server Execution Times: -- CPU time = 4 016 ms, elapsed time = 173 433 ms. ALTER TABLE dbo.Product ADD CONSTRAINT PK_Product PRIMARY KEY CLUSTERED ( ProductID ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Finally, very last step in this section is enabling “System Versioning”:
--SQL Server Execution Times: -- CPU time = 1 593 ms, elapsed time = 1 607 ms. ALTER TABLE dbo.Product ADD PERIOD FOR SYSTEM_TIME (Valid_From, Valid_Till), Valid_From datetime2 GENERATED ALWAYS AS ROW START NOT NULL DEFAULT GETUTCDATE(), Valid_Till datetime2 GENERATED ALWAYS AS ROW END NOT NULL DEFAULT CONVERT(DATETIME2, '9999-12-31 23:59:59.9999999'); ALTER TABLE dbo.Product SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Product_History, DATA_CONSISTENCY_CHECK = ON))
History table data
Since base table loaded with 1 000 000 rows, it is a time to trigger filling of a history table.
The following script contains a sequence of UPDATE and DELETE commands. Six commands will affect 100 000 random rows each.
-- SQL Server Execution Times: -- CPU time = 3 750 ms, elapsed time = 10 467 ms. UPDATE dbo.Product SET Weight = RAND() WHERE ProductID in ( SELECT TOP (100000) ProductID FROM dbo.Product order by NEWID() ) -- SQL Server Execution Times: -- CPU time = 4470 ms, elapsed time = 12 842 ms. UPDATE dbo.Product SET Weight = RAND() WHERE ProductID in ( SELECT TOP (100000) ProductID FROM dbo.Product order by NEWID() ) -- SQL Server Execution Times: -- CPU time = 3936 ms, elapsed time = 5440 ms. DELETE FROM dbo.Product WHERE ProductID in ( SELECT TOP (100000) ProductID FROM dbo.Product ORDER BY NEWID() ) -- SQL Server Execution Times: -- CPU time = 4 640 ms, elapsed time = 15 283 ms. UPDATE dbo.Product SET Weight = RAND() WHERE ProductID in ( SELECT TOP (100000) ProductID FROM dbo.Product order by NEWID() ) -- SQL Server Execution Times: -- CPU time = 4 203 ms, elapsed time = 9 773 ms. UPDATE dbo.Product SET Weight = RAND() WHERE ProductID in ( SELECT TOP (100000) ProductID FROM dbo.Product order by NEWID() ) -- SQL Server Execution Times: -- CPU time = 3627 ms, elapsed time = 2806 ms. DELETE FROM dbo.Product WHERE ProductID in ( SELECT TOP (100000) ProductID FROM dbo.Product ORDER BY NEWID() )
Note that if “System Versioning” enabled, there is no way or need to perform operations on top of the history table. Regular DML commands will do transparently required internal job on maintaining history table and keeping ValidFrom, ValidTill columns in an actual state:
exec sp_spaceused 'dbo.Product' /* name rows reserved data index_size --------- -------- ------------ ------------ ------------ Product 800000 1335688 KB 1333312 KB 2264 KB */ exec sp_spaceused 'dbo.Product_History' /* name rows reserved data index_size ---------------- ------- --------- ---------- ----------- Product_History 600000 51208 KB 50288 KB 416 KB */
Applying changes to the schema
This is the most crucial part of the post, it shows a generic technique of how changes propagated. Three common patterns picked for testing:
- Add a new column
- Enlarge datatype of an existing column
- Drop a column
Add a new column
Therefore, the first item covers the addition of a new nullable column. The following script contains just four lines of the code and meaning of it pretty clear by the syntax perspective: firstly, versioning turned off, then changes to be applied to both tables: base and history and finally, system versioning enabled back.
Statistics output shows that first three commands are pretty fast because changes performed on metadata level. However, the last one requires some data engine efforts to be completed. Those efforts are consistency checks:
-- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 0 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF) -- SQL Server Execution Times: -- CPU time = 15 ms, elapsed time = 2 ms. ALTER TABLE Product ADD ColNew BIGINT NULL; -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 1 ms. ALTER TABLE Product_History ADD ColNew BIGINT NULL; -- SQL Server Execution Times: -- CPU time = 2578 ms, elapsed time = 3343 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History , DATA_CONSISTENCY_CHECK = ON));
According to official documentation that check is not required if schema alteration wrapped by TRANSACTION:
Data check is unnecessary when the schema change is made within a transaction as no data changes can occur
Following script proves that new column added by altering metadata only and schema change execution time is fantastic: 1 ms!:
BEGIN TRAN -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 0 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF) -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 1 ms. ALTER TABLE Product ADD ColNew BIGINT NULL; -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 0 ms. ALTER TABLE Product_History ADD ColNew BIGINT NULL; -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 0 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History , DATA_CONSISTENCY_CHECK = OFF)); -- DC turned off!! COMMIT TRAN
Enlarge datatype of an existing column
The second item in exploration – data type enlargement – it requires changes on the data page level, so timings should be different:
BEGIN TRAN -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 1 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF) -- SQL Server Execution Times: -- CPU time = 1969 ms, elapsed time = 3843 ms. ALTER TABLE Product ALTER COLUMN ProductModelID BIGINT -- SQL Server Execution Times: -- CPU time = 4625 ms, elapsed time = 5520 ms. ALTER TABLE Product_History ALTER COLUMN ProductModelID BIGINT -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 0 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History , DATA_CONSISTENCY_CHECK = OFF)); COMMIT TRAN
The column type changed from INT to BIGINT. Both are fixed-width types, therefore data engine has to do some work on a page level of both objects, one by one. As you can see, alteration duration still stays within rational limits.
Drop a column
Finally, the third item to check – column deletion. By going forward, this alteration occurs on metadata level and shows very robust timings:
BEGIN TRAN -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 1 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF) -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 1 ms. ALTER TABLE Product drop column ProductModelID -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 1 ms. ALTER TABLE Product_History drop column ProductModelID -- SQL Server Execution Times: -- CPU time = 0 ms, elapsed time = 0 ms. ALTER TABLE Product SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History , DATA_CONSISTENCY_CHECK = OFF)); COMMIT TRAN