Overcoming Limitations of Get Metadata Activity in Azure Data Factory / Synapse
Problem Statement
When working with files uploaded on Azure Blob Storage or Azure Data Lake Storage, there are limitations to the properties that can be accessed using the Get Metadata Activity in Azure Data Factory / Synapse.
The Get Metadata Activity can only retrieve a subset of properties as shown below:
However, there is a way to retrieve other properties such as Creation Time and Content-Type in Synapse / Data Factory pipelines.
Prerequisites
- Azure Data Factory / Synapse
- Azure Blob Storage / Azure Data Lake Storage
Solution
1. To retrieve additional blob file properties, we can leverage the Azure Blob Storage REST API : Get Blob.
2. To authenticate via Managed Identity, provide Synapse / Data Factory Storage Blob Data Reader access within the Azure Blob Storage.
a) Go to Access Control IAM of Azure Blob Storage and Click on Add & Select Add Role Assignment
b) Search Storage Blob Data Reader role and proceed further
3. Create a pipeline within Synapse / Data Factory and use a Web Activity to trigger the REST API.
URL
In case of Azure Blob Storage
https://<<StorageAccountName>>.blob.core.windows.net/<<ContainerName>>/<<FileName>>
In case of Azure Data Lake Storage
https://<<DataLakeStorageName>>.dfs.core.windows.net/<<ContainerName>>/<<FileName/DirectoryName>>
Method: GET
Authentication: System Assigned Managed Identity
Resource: https://storage.azure.com/
Headers:
1 x-ms-version : 2017-11-09
Output
Get Metadata Activity output
Web Activity Output (Azure Blob Storage)
AI Artificial Intelligence Azure Cloud Computing Drive Iaas Learning Machine Microsoft Saas SQL Tech Technology