Workload Management in Azure Synapse Analytics

Managing varied workloads with proper resource allocation for multiple concurrent user environment is the biggest challenge a team might face when retrieving data from an azure synapse analytics dedicated sql pool db. Workload management in azure synapse analytics gives you access to control your workload that are utilizing your system resources. Setting up the best workload management scheme is critical to effectively manage the available resources. Basically, workload management refers to all the operations that can be set to make sure there are no resource contention while accessing the data stored inside the data warehouse.

Workload Groups:

Creating a workload group is the first step towards setting up the pillars of workload management. The workload group lets us determine and create a customized entity or group which can allow specific number of resources get utilized based on the different properties that has been set. The three pillars are part of workload group which can help the administrator to have a fine-grained access defined for various set of users in an organization.

Pillars of Workload management:

Before the workload management group feature was in place, Synapse SQL used resource classes to manage query performance. Resource classes allowed users to assign memory based on role membership, but it behaves as a one-time activity, and you cannot have any control mechanism or can govern the workload running under the synapse database. For example, if there is a user who has been granted with smallrc role membership can consume all the 100% memory available. With the new dedicated pool’s workload management feature, there are three workload management concepts which will help you gain control on how your queries are utilizing the available system resources in a dedicated SQL pool.

WORKLOAD CLASSIFICATION:

Classification in workload management allows us to configure policy that can be applied to the incoming requests by assigning resource classes and importance. Previously this is done using the role membership assignment with ‘sp_addrolemember’ which now will be taken care by workload classification using ‘create workload classifier’ function. In addition to this it also has options to set labels, start and end time, session as per the requirements. A better use case in a typical data warehousing environment would be when you want to prioritize your data load by assigning them with higher resource class and resources compared to queries from users which can considered with lower resource class than the data load. Not only this, but you can also assign finer controls by subclassifying both the query and data loads to ensure the larger data load or query has higher resource class or importance settings than the smaller ones. For example, the biometric data load in a company might be huge and need to be loaded before the payroll calculation data which is small.

Note: DBCC commands like BEGIN, COMMIT, and ROLLBACK TRANSACTION statements cannot be classified.

CREATE WORKLOAD CLASSIFIER

Workload classification is a function which can map the users with workload group that has been already defined.

CREATE WORKLOAD CLASSIFIER classifier_name

WITH

    (   WORKLOAD_GROUP = ‘name’

    ,   MEMBERNAME = ‘security_account’

[ [ , ] WLM_LABEL = ‘label’ ]

[ [ , ] WLM_CONTEXT = ‘context’ ]

[ [ , ] START_TIME = ‘HH:MM’ ]

[ [ , ] END_TIME = ‘HH:MM’ ]

[ [ , ] IMPORTANCE = { LOW | BELOW_NORMAL | NORMAL | ABOVE_NORMAL | HIGH }])

[;]

Now let’s look at each of these parameters in detail.

WORKLOAD_GROUP: The name which maps the request to a workload group at the time of creating the classifier.

MEMBERNAME: The security user account which can be database user, role or AAD login/AAD group.

WLM_LABEL: Label is like that of a tag that we use when creating any azure services. It is an optional parameter and one can use the OPTION(LABEL) in the request to match the classifier configuration.

CREATE WORKLOAD CLASSIFIER etlloads WITH

( WORKLOAD_GROUP = ‘dataload_wg’

,MEMBERNAME = ‘etl_login’

,WLM_LABEL = ‘fact_load’ )

SELECT COUNT(*)

FROM Customer

OPTION (LABEL = ‘fact_load’)

WLM_CONTEXT: This is an optional parameter. Instead of labelling each query in a session, you can simply apply session context which can last for the entire session.

CREATE WORKLOAD CLASSIFIER etlloads WITH

( WORKLOAD_GROUP = ‘dataload_wg’

,MEMBERNAME = ‘etl_login’

,WLM_CONTEXT = ‘fact_load_cont’ )

–setting the session context

EXEC sys.sp_set_session_context @key = ‘wlm_context’, @value = ‘fact_load_cont’

START_TIME and END_TIME: Both start and end times are in HH:MM format under UTC time zone. Both must be specified together and is a great feature to route the activity to be carried out in a specific time frame only.

CREATE WORKLOAD CLASSIFIER etlloads WITH

( WORKLOAD_GROUP = ‘ dataload_wg’

,MEMBERNAME = ‘etl_login’

,START_TIME = ’22:00′

,END_TIME = ’02:00′)

IMPORTANCE: It specifies the importance of a request that has been made and it influences the order on which the requests are scheduled by assigning priority access to resources. There are 5 different types of importance that can be set and NORMAL is default one.

LOW
BELOW_NORMAL
NORMAL (default)
ABOVE_NORMAL
HIGH

PARAMETER WEIGHTING: Weighting will be assigned to the requests based on their classification parameters and higher the weightage importance will be determined.

Classifier Parameter	Weight
USER	64
ROLE	32
WLM_LABEL	16
WLM_CONTEXT	8
START_TIME/END_TIME	4

For example, when our workload classifier fits with the following parameters, the higher the weightage of the classifier priority will be given to that workload classifier. The second one with WLM_CONTEXT & USER wins in this below example, hence proper mix of weightage should be declared if prioritization for the classifier is expected.

START & END_TIME, WLM_LABEL & ROLE: 4+16+32= 52 Points

WLM_CONTEXT & USER: 8+64= 72 Points

Dropping Classifier:

DROP WORKLOAD CLASSIFIER etlloads;

Typical Example:

An example of creating a classifier with staticrc20 workload group

CREATE WORKLOAD CLASSIFIER etlloads

WITH (WORKLOAD_GROUP = ‘staticrc20’

,MEMBERNAME = ‘etl_login’

,IMPORTANCE = above_normal);

WORKLOAD IMPORTANCE:

Specifies influence on order of access for the resources to execute the queries. We must understand that workload importance is not a shortcut that would execute a query with high importance immediately cancelling or ignoring every other request, it just cuts down the wait time that would be required normally for a resource to acquire its share for resources. A typical example is when a resource with higher importance will have first access to resources available in the system that the ones with lower importance.

Creating a workload classifier, for a user with higher importance run:

CREATE WORKLOAD CLASSIFIER etlloads

    WITH (WORKLOAD_GROUP = ‘xlargerc’

         ,MEMBERNAME     = ‘ etl_login‘

         ,IMPORTANCE     = above_normal);

Creating a workload classifier for a user to run adhoc queries with lower importance:

CREATE WORKLOAD CLASSIFIER etlloads

    WITH (WORKLOAD_GROUP = ‘xlargerc’

         ,MEMBERNAME     = ‘ etl_login‘

         ,IMPORTANCE     = below_normal);

WORKLOAD ISOLATION:

it reserves the resources exclusively for the workload group. It ensures that resources are continuously available for execution for some of the important processes for the defined workload groups. If there is no workload isolation all the query requests will be utilising the shared resource pool and contending for the resources. Once in the shared pool the availability of resources for processes are not guaranteed and allotted only based on importance that has been set. Workload isolation is achieved by configuring the MIN_PERCENTAGE_RESOURCE parameter to greater than zero in the CREATE WORKLOAD GROUP syntax. Workload groups are created when creating the workload group using the CREATE WORKLOAD GROUP syntax.

Configuring workload isolation should be done with caution as the resources are allocated to the workload group even if there are no active requests in the workload group. Over-configuring isolation can lead to diminished overall system utilisation.

If there is no workload isolation configured by default the requests are operated through shared pool of resources. Requests within the shared resource pool are based on importance level and the resources with same importance level works based on FIFO method.

Another level of isolation is workload containment wherein you can configure and limit the amount of resource an workload group can consume. It is achieved by configuring the CAP_PERCENTAGE_RESOURCE parameter to less than 100 when creating workload group.

Create a workload group for dataload_wg with 25% isolation.

CREATE WORKLOAD GROUP dataload_wg

WITH ( MIN_PERCENTAGE_RESOURCE = 25

,CAP_PERCENTAGE_RESOURCE = 100

,REQUEST_MIN_RESOURCE_GRANT_PERCENT = 5);

Create a workload classifier to map login etl_login to the dataload_wg workload group

CREATE WORKLOAD CLASSIFIER [wgcELTLogin]

WITH (WORKLOAD_GROUP = dataload_wg

,MEMBERNAME = etl_login);

For monitoring queries:

–Workload groups

SELECT * FROM sys.workload_management_workload_groups

–Workload classifiers

SELECT * FROM sys.workload_management_workload_classifiers

–Run-time values

SELECT * FROM sys.dm_workload_management_workload_groups_stats

Conclusion:

This is an advanced topic in synapse, hence I have deep dived the workload classification than the subsequent topics like workload isolation and workload importance. I believe that once you start understanding the basics and practical implementation and application of these features you will find easier in managing the performance of synapse dedicated pool when running huge loads.

Published

June 12, 2023

by vinodh | June 12, 2023