Today we will check how to create an external data source to access data stored in other resources. If you could remember, in one of our previous articles we have discussed that there will be a Logical Data Warehouse (LDW) which will work similar to a database that you could see in azure synapse analytics. Although physically it may appear as a database it is only a virtual layer which can hold data from various external sources and enables users to query from synapse. This is also called as Data Virtualization. In SQL Server database this feature is called as Polybase through which the database users can query the external sources or external databases.
Traditionally the Data’s from a single or multiple data sources like ADLS, API, IOT, Database etc., are collected by creating a ETL pipeline and then is stored into a data warehouse which is used to query the data. Whereas in Data Virtualization you don’t have to design pipelines to pull data and then get it stored into local storage, you can query the external resources directly through normal SQL query. And the best thing is user never feels that he is accessing an external DataSource, it is similar to accessing data that are stored locally.
Let’s see how External Data Source can be created.
I am utilizing the Synapse workspace which I created for my last article here. As you know I have an ADLS gen2 storage that is created and mapped to this and this account is which we are going to use to create an external data source. You can get the name of the account when you hover the mouse over it and we going to use it to directly query the data present in it through SQL query.
We can create external data source using SQL query, now go to the develop tab on the left and create a new SQL script.
Please note that before creating the External DataSource you have to create the Database Scoped Credential which in-turn requires Master Key to be created as well. To find the SAS token that has to entered in the SECRET key, please refer the below screenshot and generate your SAS and connecting string. Keep in mind that these secret tokens come with expiry time, if you plan to use it later enter the From/To time as per your requirement.
The following query consists of the three mandatory steps that are needed for creating the External Data source. I have run the query against the ‘serverlessSQLPoolDB’ database that we have created in one of our previous article.
-- Create a database master key if one does not already exist, using your own password. -- -- This key is used to encrypt the credential secret in next step. Use serverlessSQLPoolDB go CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password' ; -- Create a database scoped credential with Azure storage account key as the secret. -- Remove the ‘?’ string from the SAS token when entering in the SECRET CREATE DATABASE SCOPED CREDENTIAL TestCred WITH IDENTITY = 'SHARED ACCESS SIGNATURE' , SECRET = 'xxxxxsv=2020-08-04&ss=bfqt&srt=rwdlacupx&se=2021-12-24T14:46:50Z&st=2021-12-24T06:46:50Z&spxxxxxr=https&sig=dO2eRMGxFb0BMOstcxIpDW8k3BTru1Psly70ch%2Bg8Do%3D'; GO -- Create an external data source with CREDENTIAL option. CREATE EXTERNAL DATA SOURCE TestDataSource WITH (LOCATION = 'https://synadlsgen.dfs.core.window.net/', CREDENTIAL = TestCred) ;
Once the query run successfully, refresh the External Data Sources to see the newly created datasource.
This is how you can create a data source to access data’s from external storage without moving them locally.
Microsoft official docs