DP-203 Sample paper 1
Online Certification Questions
Question 1
Which of the following describes a scenario in which a data team will want to utilize cluster pools?
- A. An automated report needs to be refreshed as quickly as possible.
- B. An automated report needs to be made reproducible.
- C. An automated report needs to be tested to identify errors.
- D. An automated report needs to be version-controlled across multiple collaborators.
- E. An automated report needs to be runnable by all stakeholders.
An automated report needs to be refreshed as quickly as possible.
Cluster pools are used in Databricks to reduce the time needed to create and scale clusters by maintaining a set of pre-configured, ready-to-use instances. When an automated report needs to be refreshed quickly, cluster pools help by minimizing cluster startup time, allowing the report generation process to start almost immediately. This is especially beneficial in scenarios where low latency is required to ensure data is updated in near real-time.
Question 2
A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.
Which of the following describes how a data lakehouse could alleviate this issue?
- A. Both teams would autoscale their work as data size evolves
- B. Both teams would use the same source of truth for their work
- C. Both teams would reorganize to report to the same department
- D. Both teams would be able to collaborate on projects in real-time
- E. Both teams would respond more quickly to ad-hoc requests
Both teams would use the same source of truth for their work
Databricks Lakehouse enables using data as the single source of truth. Duplicating data often results in data silos in organizations. Correct answer B.