Azure Data Factory & DevOps – Advanced YAML Pipelines

In this post I would like to touch again modern Azure DevOps YAML Pipelines. It will be an extension of the previous one – Azure Data Factory & DevOps – YAML Pipelines. After it was published, YAML Pipelines moved from a preview to a general availability state. So, it is a good time to talk on what next and cover more advanced topics, such as pipeline templates, parameters, stages, and deployment jobs.

Prerequisites

Source Control enabled instance of ADF. The development environment should have already a source control integration configured. This aspect was illustrated previously in a post: Azure Data Factory & DevOps – Integration with a Source Control

Basic understanding of YAML Pipelines. Especially, in a combination with ADF. The content of Azure Data Factory & DevOps – YAML Pipelines is a nice starting point. I will expand the initial idea and refactor the code of that post.

The next steps

While the previous post shows basic scenarios of YAML it has still a “one-file” implementation, without any code re-use, approvals, and splits into stages. These issues will be addressed one-by-one now.

Parameterized templates

The primary purpose of the Azure DevOps YAML Pipeline templates is a code re-usability. Think of them as a mimic of stored procedures in SQL Server. The parametrized template acts similarly to a stored procedure with parameters. There is another function though – the template can be set as a mandatory one to be inherited so it can act as the security or control protection of what is allowed in the inherited pipeline. In this post I will focus only on a code re-usage. Below is an example of how one pipeline “re-uses” the same template twice with different parameters – to deploy ADF to staging and production environments:

stages:

# Deploy to staging
  - template: stages\deploy.yaml  
    parameters:
      Environment: 'stg'      

# Deploy to production
  - template: stages\deploy.yaml  
    parameters:
      Environment: 'prd'      

Stages

Stages act like logical containers for pipeline jobs. For instance, a build stage triggers jobs related to a build process, the deployment stage consists of some deployment jobs and so on. Stages also needed in case if the pipeline must be paused during the execution for external checks. Think of approvals, artifact evaluations, REST API calls, etc.

Image 1. Stages in action

Deployment Jobs

Deployment jobs are special types of regular jobs. They provide a few extra benefits:

  • various deployment strategies.
  • maintain specific deployment history for further auditing.
  • mapped to “environments”, this unblocks possibilities for setting approvals.

The code transformation

I will start with a refactoring of the initial code that was used in a previous blog post. That time it was single-file YAML pipeline main.yaml, which kept all logic. It will be split into three smaller files:

This image has an empty alt attribute; its file name is step-1-1.png
Image 2. A split of the pipeline definition

File main.yaml will have the role of an orchestrator. A few extra files in folder stagesbuild.yaml and deploy.yaml will be new templates that will implement the logic of a certain stage.

The starting point or main.yaml

The main.yaml contains a trigger configuration, internal variables, and calls of stage-specific templates.

# Batching trigger set to run only on adf_publish branch
# build folder is to be excluded

trigger:
  batch: true
  branches:
    include:
      - adf_publish 
    exclude:
      - master
  paths:
    exclude:
      - build/* 
    include:
      - "*"


variables:
   ProductName: "azdo-adv-yaml" 
   Subscription: "Visual Studio Enterprise Subscription(your_subscription_id)"

# The build agent is based on Windows OS.  
pool:
   vmImage: "windows-latest"


stages:

# Build  ------------------------------------------------------------

  - template: stages\build.yaml  
    parameters:
      displayName: 'Build a code'


# Deploy  ------------------------------------------------------------

# Deploy to staging
  - template: stages\deploy.yaml  
    parameters:
      Environment: 'stg'      
      ProductName: '${{ variables.ProductName }}'
      Subscription: '${{ variables.Subscription }}'


# Deploy to production
  - template: stages\deploy.yaml  
    parameters:
      Environment: 'prd'      
      ProductName: '${{ variables.ProductName }}'
      Subscription: '${{ variables.Subscription }}'  

The build stage – build.yaml

This is a first template that has a simple task – prepare and publish arm templates for further deployment. Please note that it references ARM files of a git folder resources\arm\blank-adf. That folder and files are not by default in a branch adf_publish and have to be added manually. More details about why this needed in a previous post.

parameters:
- name: displayName  
  type: string
  default: 'Build'

stages:

- stage: Build
  displayName: 'Build Stage'
  
  jobs:  
  - job: Build
    displayName: ${{ parameters.displayName }}

    steps:

    # Step 1: Checkout code into a local folder src
    - checkout: self
      path: src


    # Step 2a: Find arm json files for a deployment of blank adf in a src and copy them into the artifact staging folder
    - task: CopyFiles@2  
      inputs:
        SourceFolder: '$(Pipeline.Workspace)\src\resources\arm\blank-adf'
        Contents: '**/*.json'
        TargetFolder: '$(build.artifactstagingdirectory)\arm'
        CleanTargetFolder: true
        OverWrite: true
      displayName: 'Extra ARM - Blank ADF Service'
      enabled: true


    # Step 2b: Find other adf files, which will deploy pipelines, datasets and so on  in a folder adf_publish and copy them into the artifact folder
    - task: CopyFiles@2  
      inputs:
        SourceFolder: '$(Pipeline.Workspace)\src'
        Contents: '**/*ForFactory.json'
        TargetFolder: '$(build.artifactstagingdirectory)\adf_publish'
        CleanTargetFolder: true
        OverWrite: true
        flattenFolders: true
      displayName: 'Extract ARM - ADF Pipelines'
      
      
    # Step 3: Publish the artifacts
    - publish: $(build.artifactstagingdirectory)
      artifact: drop
      displayName: 'Publish artifacts'

There are two CopyFiles tasks. The first one copies ARM files that needed to create an empty Data Factory instance. The other “copy” task extracts ARM files that describe the state of Data Factory objects

Deployment stage – Deploy.yaml

The other template is to perform a deployment of extracted arm files. This template referenced twice in main.yaml – to run deployment in staging and production environments.

parameters:

- name: Environment 
  default: 'stg'

- name: Subscription
  default: 'yourSubscription' 

- name: ProductName
  default: 'prd1'


stages:

- stage: DeployRing_${{ parameters.Environment }}
  displayName: 'Deploy to ${{ parameters.Environment }} Ring'
  
  
  jobs:      

  - deployment: DeployADF
    displayName: Deploy ADF

    environment: 'env-adf-${{ parameters.ProductName }}-${{ parameters.Environment }}'

    variables:
      
      Subscription: '${{ parameters.Subscription }}'
      Environment: '${{ parameters.Environment }}'
      ProductName: '${{ parameters.ProductName }}'

    strategy: 
      runOnce:
        deploy:             

          steps:
          
          # Step 1: Download the code published by build stage
          - download: current
            artifact: drop

          # Step 2: Deploy the instance of Azure Data Factory. This task is needed on the first deployment  of the environment.
          # If, ADF instance already exists, it will do nothing.
          - task: AzureResourceManagerTemplateDeployment@3
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: '$(Subscription)'
              action: 'Create Or Update Resource Group'
              resourceGroupName: 'rg-$(ProductName)-$(Environment)'
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)\drop\arm\template.json'
              csmParametersFile: '$(Pipeline.Workspace)\drop\arm\parameters.json'
              overrideParameters: '-name "df-$(ProductName)-$(Environment)"'
              deploymentMode: 'Incremental'
            displayName: Deploy ADF Service
            enabled: true


          # Step 3: Deploy Azure Data Factory Objects like pipelines, dataflows using ARM templates that ADF generate during each publish event
          - task: AzureResourceManagerTemplateDeployment@3
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: '$(Subscription)'
              action: 'Create Or Update Resource Group'
              resourceGroupName: 'rg-$(ProductName)-$(Environment)'
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)\drop\adf_publish\ARMTemplateForFactory.json'
              csmParametersFile: '$(Pipeline.Workspace)\drop\adf_publish\ARMTemplateParametersForFactory.json'
              overrideParameters: '-factoryName "df-$(ProductName)-$(Environment)"'
              deploymentMode: 'Incremental'
            displayName: Deploy ADF Pipelines
            enabled: true

The deployment job downloads ARM files, creates an instance of ADF if needed and then perform a final deployment of objects like datasets, pipelines, linked servers, and so on.

The source code

The source code can be found on my GitHub repository: adf-yaml-advanced

Final words

This post is about more advanced usage of Azure DevOps YAML pipelines together with Azure Data Factory. It shows some benefits of splitting the yaml code into smaller chunks also it shows how stages and more powerful deployment jobs can be used.

Many thanks for reading.