Towards Databricks Certified Data Engineer Professional
I decided to obtain the Databricks Certified Data Engineer Professional recognition within 2023. It was a New Year Resolution goal that resonated well with my career path. Surprisingly, few related study materials, blog posts, or forum discussions were available. I passed the exam on the first try, though, going blindly. This story is a compilation of memories and bits of advice on how to get ready and perform optimally during the exam.
How was it different from Databricks Certified Data Engineer Associate examination experience?
This is the common question I’ve heard recently, and it makes sense because the first logical step before passing the Professional examination is to complete an Associate’s one.
The Databricks Data Engineer Associate certification is more accessible due to various reasons:
- The difficulty and deepness of the questions.
- It has fewer questions (however, also equally, the duration is shorter).
- The available free practice test precisely represents the difficulty of the exam so that an adequate self-estimation can be accomplished ahead.
That said, the candidate with six months of daily Databricks usage may be ready for the associate-level testing. Getting the associate-level certification is a wise step before running professional accreditation.
Getting ready for the exam
Get sufficient Databricks experience
I would advise considering professional certification if you spent with Databricks (or Spark) at least 18–24 months and already had a chance for hands-on experience with things like Structured Streaming, Jobs APIs, Compute Pools, and the choice between Delta Merge vs. DataFrame.Write is not an issue.
Schedule the right exam time
Each person has (his/her own) optimal alerting time. This is the daily repeating period of complete focus and cognitive performance. Mostly this is a time when it is easy for you to execute linear tasks sequentially. In my case, this period is the morning hours supplemented by enough caffeine and a night of good sleep.
I highly recommend scheduling the exam at this time and avoiding hours of energy deep or evening hours when your body and brain are ready for sleep. This is going to be especially important for such an intense exam.
Prepare the workspace
If you schedule the online exam, ensure that nothing will interrupt you.
You will not be able to use noise-canceling headphones, so choose a quiet room and close the windows to avoid potential noise from outside.
It might be that just before the start of the exam, the environment is ambient, you hear only birds from the backyard, and nothing can go wrong. However, in half an hour, the situation might change, and road workers started fixing something nearby. You cannot leave the seat during the exam because your webcam tracks you. If that happens and you will stand up for a walk to the other side of the room, for instance, to close the door or window, be ready that exam might be canceled.
For a few hours, consider keeping your family members, pets, and other home inhabitants away from the examination room. All you need is a complete focus during that time.
During the exam
Time management
This is probably the main advice — do not get stuck on complex questions. Some of them will take plenty of time even to read. If you get into such one and do not have a quick answer, mark it for a later review and skip it.
Evaluate all available answers
Because of the time pressure, I tended to choose the “good enough” answer to win some time. Luckily, I quickly realized that there were better strategies than this. The answers are often very similar and look correct at first glance, but this exam intends to determine if you understand those more minor details hidden behind each variant. Therefore, scanning and evaluating each answer before moving to the next question is better.
Keep the most challenging questions for later
This examen has 60 questions and only 2 minutes per each to read, understand, evaluate all answers, and then choose the proper one. After answering all “doable” questions, you will likely have some time in reserve. Use it for challenging items.
The hint to the correct answer might be a part of sub-sequential questions.
Exam topics to put an extra attention
I do not think that I am allowed to disclose exact exam questions and answers. However, I believe I can share a general notion of the exam. The questions were mainly constructed to test the practical knowledge and understanding of the data platform.
In my case, the following few categories were dominant. Also, in contrast to the Associate exam, this time, I didn’t get any Delta Live Table questions.
Structured Streaming:
- Types of triggers: availableNow, Once, processingTime.
- Watermarking and processing windows
- Role of streaming checkpoints
Processing Strategies:
- Scenarios to use Delta Merge
- Cases when DataFrame.Write is optimal
- The order of operations and the impact on deduplication
- The logical outcome of the chain of transformations on the DataFrame
Databricks Management and Automation:
- The use of various rest APIs. For instance, what will happen if Create Job API runs three times with the same payload?
- The use of Compute Pools
- User Access management and privileges, especially in the light of the Unity Catalog
- Integration with the source control using Git Repos
- Databricks SQL Alerts and Dashboards
Final Words
The Databricks Certified Data Engineer Professional is a challenging exam. It is also a fair one. It touches perhaps all aspects of the Databricks platform and checks your knowledge and understanding of the optimal use of the components in various situations. So I wrote this article to put all the memories together. I hope someone will find it helpful and actionable and eventually experience the same satisfaction and relief by observing the final score page. Good luck!