Databricks Certified Associate Developer for Apache Spark 3.0

A few days ago I passed this assessment and it was special for me due to a few factors. This is my first vendor-specific non-Microsoft certification. And this is the very first time when I must complete it fully online. In this post, I would like to cover those factors, so I hope it will be helpful for someone who wants to pass it also and has Microsoft certificates only.

Planning and registration

The very first step is to get an exam planned. In the same way as Microsoft, Databricks uses a third-party service for running and measuring assessments. In their case, it is a KRYTERION. The algorithm of the registration procedure is also pretty much similar to what I had earlier with bigger vendors:

  1. By using a Databricks Academy I’ve chosen which assessment I want to complete
  2. The registration button forwarded me to KRYTERION’s Webassessor portal where I had to register and then choose an assessment again.
  3. The next logical step was hardware testing: a web camera, a microphone, and finally – network bandwidth.
    For MacBook users: All my tries of a network bandwidth test in Safari failed, while this was successful in Google Chrome. Anyway, I decided to use my Windows 10 laptop.
  4. Then, the registration by itself and payment checkout.
  5. As the final steps, I had to prepare my laptop for an exam, so a specialized software package called a “Sentinel” was be installed
  6. In the end, the biometric profile is to be created. Luckily creation of it was a fairly simple  process and somehow similar to an initial setup of Apple Face ID: The webcam captured my face when I slightly rotated it till a high index value was displayed so I was able to finalize this process.

It is also safer to enroll in an assessment and pass it using the same hardware. This warning I found on the official website:

Sentinel analyzes your typing style. If you attempt to test on a different computer or keyboard than the one you used to establish your biometric profile, your identity may be called into question. Your exam may be delayed or cancelled until your identity is confirmed.

The process of assessment

It was not smooth initially. A few minutes after the beginning of the assessment it was interrupted by a Kryterion Support staff, the support forwarded me to a specialized chat page where I was kindly asked to restart my notebook. The reason: the web camera did not work correctly. I was informed that the exam timer was paused, and I will return to my progress after reboot.

I restarted the laptop and re-joined the exam, however was still locked on the initial support page.  I could still see their messages with an original request for a laptop restart, so after a few minutes of waiting, I typed a confirmation that I am ready to continue. Then within the next 15 minutes, I repeated the same statement a few times again. However, without any reaction from the other side of the screen.

Luckily, I spotted an exclamation button on the title bar and pressed it which triggered someone from a support team to get in contact with me so the assessment process was finally unblocked.

Besides that incident, everything else went just the same way as I contemplated many times earlier in testing centers like Pearson VUE or Prometric. I had to read questions and answer them. When all questions were done, I had to submit an assessment’s results and finalize the session.

Luckily, I received relieving passing result immediately and a copy of my score was also sent to my email box. The official Databricks results and the exam badge arrived in a few days.

Therefore, besides the technical issues, an online experience of Kryterion is very similar to what I had earlier by taking exams in centers like VUE and Prometric.

Was it an easy exam?

My answer will be very subjective. Nevertheless, I think it is doable to beat an exam for a candidate with 6+ months of active daily PySpark / Databricks use.

The candidate will see some internal architecture questions, so a prior reading of the book Spark: The Definitive Guide is highly recommended even by authors of the exam.

For instance, this image should contain any secrets on what’s going on there:

A screenshot of a computer

Description automatically generated with medium confidence

I cannot disclose real questions I’ve seen, so can only express abstract comparable alternatives to give an idea. The candidate should be ready for questions like “what is the name of setting that controls the default number of partitions during the shuffling?” or “how many executors can run simultaneously on one worker node”, etc.

The majority of questions will be related to Spark’s DataFrame API. An example:

Identify an error in the following line of the code and choose the right answer:

dataframe1.join(dataframe2, “inner”, “ProductID”)

For those who use Spark daily, the answer is obvious – it is an order of parameters.

However, I think that exam can be tough for someone who has only theoretical knowledge and minimal real practice, for someone who perhaps still uses snippets from StackOverflow for basic Spark transformations, like joins, projects, unions.

Mainly because many questions are tricky. They have very similar answers and only small deviations among them helped me to choose the right one. Due to this reason, I think it will be hard to memorize syntax using theoretical knowledge only. The hands-on experience which is turned into a muscle memory on Spark API syntax is very recommended.

Final words

I hope that this post will be helpful for someone who is preparing for such certification and going to try it. The assessment is well written and doable for a specialist with real-world hands-on PySpark / Databricks practice. Nobody is protected from the hardware issues, fortunately, during my session, Kryterion support staff was ready to help me with it.

Many thanks for reading.