Azure

Architecture Of Azure Synapse Analytics

Azure Synapse SQL is a technology which resides inside the Synapse workspace. Totally we have two pools which we have discussed in detail in one of our articles few weeks ago.

  1. Dedicated SQL Pool
  2. Serverless SQL Pool

The built-in ‘Serverless SQL Pool’ gets created automatically when you create the workspace and the ‘Dedicated SQL Pool’ is the one that the user creates. The ‘Dedicated SQL Pool’ which was formerly called as Azure DataWarehousing, creates a database in the backend which will be visible from Data tab on the left side pane. So keep in mind that whenever you create a dedicated SQL pool you are also creating a database in the backend.

Architecture of Azure Synapse Analytics

The major difference between the two pools is the cost effectiveness. The serverless SQL pool works based on ‘Pay what you use’ method, which means you have to pay only for the resources you use when you run the query accessing data from ADLS gen2 or Blob storages and you incur no other charges. Whereas in the dedicated SQL pool, you will have dedicated resources created for you like database and the run-time engine. The compute power is defined as DWU(data warehousing units) and you have to select it when creating the dedicated SQL pool. DWU is compute power engine which runs the queries and processes the data and the Database created by dedicated SQL pool is the storage.

When you try to ingest data into a dedicated SQL pool database, the synapse architecture will store the data in a distributed fashion. The data will get partitioned into multiple distributions and then stored to optimize performance.

Totally there are three types of sharding patterns available in synapse SQL…

  • Hash
  • Round Robin
  • Replicate

You can specify which distribution pattern you need when creating a new table. I have scripted the query to show how the distribution parameter will look.

Architecture of Azure Synapse Analytics

Basically, Synapse SQL utilizes node-based architecture. The ‘Control Node’ acts as a central system of the synapse SQL architecture. It is the entry point for all the requests and processes made to the synapse and is common for both Dedicated and Serverless SQL pool models.

Architecture of Azure Synapse Analytics

The Dedicated SQL Pool follows true MPP (Massively Parallel Processing) architecture. It collects the queries submitted to it, transforms them into parallel queries and each query is then passed on to compute nodes. Once the query executions are completed by all of the distributions /compute nodes, the data has to be collected to present a single result output. To perform this dedicated SQL pool deploys the DMS (Data Movement Service) which moves the data between the compute nodes and then the single unit of output is presented.

In Serverless SQL Pool, the queries will be divided into multiple tasks and assigned into many compute nodes which will utilize azure storage to process that data. Similar to DMS in the dedicated SQL pool, this is performed done by an internal feature called DQP (Distributed Query Processing Engine) which breaks down the bigger queries into several small tasks.

Summary

Hope this brings basic understanding of how azure synapse analytics works behind the scenes. This will provide you answers on how the synapse processes the data in an optimized and effective manner and yet so quickly. More topics to come, stay tuned!

Reference

  • Microsoft official documentation
Tags
Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Close