All posts by Alex Volok
Databricks Observability: Collecting Cluster Ganglia Metrics
Introduction This blog post is about a few host metrics like CPU and RAM usage and their importance in understanding cluster utilisation. The simple task of tracking them might turn into a non-trivial task if the number of clusters growing over time. The real challenge is to track such metrics for short-living job clusters. This […]
Databricks Certified Associate Developer for Apache Spark 3.0
A few days ago I passed this assessment and it was special for me due to a few factors. This is my first vendor-specific non-Microsoft certification. And this is the very first time when I must complete it fully online. In this post, I would like to cover those factors, so I hope it will […]
Data Engineering Development Environment using WSL and Miniconda
Recently I managed to get the Windows Subsystem for Linux configured on my Windows 10 workstation. So, the logical next step was to use it by getting the PySpark installed, configured, and running there. However, instead of running dozens of pip install and manual post-configuration of all those packages, I decided to rely on the […]
Preparing Windows 10 for Data Engineering or WSL(2) ❤ Windows Terminal
Recently I have to rethink the way how I use Ubuntu as the local sandbox environment. The common approach was to have a small-sized VM running in the VirtualBox or sometimes even a dedicated VM in Azure/AWS. However, Windows Subsystem for Linux (or WSL) sounds more and more often as the lightweight and easy-to-use alternative […]
Azure DevOps for SQL Server DBAs. Part 3 – Continuous Delivery of SQL Agent Jobs
In this final post of the DBAs DevOps series, I would like to talk about the remaining piece of the puzzle – delivery of SQL Agent jobs using the pipeline. I will discuss various approaches that can be used and then perform a test drive of the metadata-driven way of delivering the jobs. Prerequisites An […]