Azure

Overcoming Limitations of Get Metadata Activity in Azure Data Factory / Synapse

Problem Statement

When working with files uploaded on Azure Blob Storage or Azure Data Lake Storage, there are limitations to the properties that can be accessed using the Get Metadata Activity in Azure Data Factory / Synapse.

The Get Metadata Activity can only retrieve a subset of properties as shown below:

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

However, there is a way to retrieve other properties such as Creation Time and Content-Type in Synapse / Data Factory pipelines.

Prerequisites

  1. Azure Data Factory / Synapse
  2. Azure Blob Storage / Azure Data Lake Storage

Solution

1. To retrieve additional blob file properties, we can leverage the Azure Blob Storage REST API : Get Blob.

2. To authenticate via Managed Identity, provide Synapse / Data Factory Storage Blob Data Reader access within the Azure Blob Storage.

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

a) Go to Access Control IAM of Azure Blob Storage and Click on Add & Select Add Role Assignment

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

b) Search Storage Blob Data Reader role and proceed further

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

3. Create a pipeline within Synapse / Data Factory and use a Web Activity to trigger the REST API.

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

URL

In case of Azure Blob Storage

https://<<StorageAccountName>>.blob.core.windows.net/<<ContainerName>>/<<FileName>>

In case of Azure Data Lake Storage

https://<<DataLakeStorageName>>.dfs.core.windows.net/<<ContainerName>>/<<FileName/DirectoryName>>

Method: GET

Authentication: System Assigned Managed Identity

Resource: https://storage.azure.com/

Headers:

1    x-ms-version : 2017-11-09

Output

Get Metadata Activity output

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Web Activity Output (Azure Blob Storage)

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Tags
Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button