by

DWUs(Data Warehouse Units) in Synapse Dedicated Pool

Basically, there are two types of pools in Azure synapse analytics: Serverless SQL Pool and Dedicated SQL Pool. In serverless model as you might be aware that the costing is based on pay-per-usage model and calculated per TB or processing consumed on the queries that are run. Whereas the costing of Dedicated SQL pools is based on Data Warehousing Units or simply called as DWU based on which the resources will be provisioned. This is a setting which can be configured based on your requirement or the load you are running. The more the DWU you set the more the cost that will be incurred. Let’s dive a bit deeper to know what exactly DWU is.

What is Data Warehousing Units (DWU)?

A DWU simply is a collection of compute or analytic resources allocated for dedicated pool. It is a combination of three resources CPU, Memory and IO bundled together. The term DWU represents measure of compute resources and its performance that you are using for your dedicated pool. The DWU setting can be increased for higher performance and decreased when less performance is required. Apart from scaling up and down, the system can also be paused when there is no activity. Once after the DWU is increased for a larger workload for instance, we can see the changes in performance based on few workload metrics and how it behaves post increase of DWUs as below.

  • Ingestion speed from ADLS and storage blobs as this involves network and CPU.
    • Involves Polybase operations with increase in number of readers and writers.
  • Query scans and aggregations/transformations for large number of rows as this involves CPU and IO.
    • Change in performance for scans, aggregations and CTAS commands.
  • CTAS commands for copying and creating a new table since it involves reading from storage, distributing into nodes, and writing back again as heavy CPU, IO and Network is utilized.
    • Accommodating max number of concurrent queries.

Synapse basically comes with service levels. Gen2 is the latest tier with updated hardware & drives with networked premium storage which provides five times more memory per query than Gen1 and with unlimited storage capacity. Gen2 also provides a ten times better individual query execution time than Gen1 tier. One can differentiate the metrics of both by DWU which is Gen1 & cDWU belongs to Gen2.

https://azure.microsoft.com/en-in/updates/name-changes-azure-sql-data-warehouse-gen-1-and-gen-2/

How to set the right amount of DWH for your workloads?

The DWU configuration is workload specific, and it completely depends on your workload performance and how do you want the system to handle it. Best approach to set a baseline is to start small and do enough testing by increasing gradually until you hit the right parameter. From basic up to DW500c are all single compute nodes with varying compute power which means you are running your load on a single node server. Scale up and down to see the performance levels and adjust the DW setting accordingly and use DW1000c or above for testing any production loads, POC’s on less than 2 nodes won’t be fit enough to prove the loads for production. Best practice is to scale out DWUs before heavy data load, transformation operations or before peak business hours where you expect large number of concurrent queries.

Performance levelCompute nodesDistributions per Compute nodeMemory per data warehouse (GB)
DW100c16060
DW200c160120
DW300c160180
DW400c160240
DW500c160300
DW1000c230600
DW1500c320900
DW2000c4151200
DW2500c5121500
DW3000c6101800
DW5000c1063000
DW6000c1253600
DW7500c1544500
DW10000c2036000
DW15000c3029000
DW30000c60118000

What is the DWU setting of my current database?

SELECT d.name AS DatabaseName, Edition AS DatabaseEdition, service_objective AS ServiceObjective FROM sys.database_service_objectives AS dso INNER JOIN sys.databases AS d ON dso.database_id = d.database_id

How can I change the DWUs of my existing databases?

ALTER DATABASE <>
MODIFY (SERVICE_OBJECTIVE = ‘DW1000c’);

To check status of DWU changes that you have made:

FROM sys.dm_operation_status
WHERE resource_type_desc = <>
AND major_resource_id =<>;

Write a Comment

Comment