The Get Metadata activity in Azure Data Factory (ADF) is a
powerful tool for obtaining metadata about data stored in various locations.
This activity can be used to extract information such as file existence, file
size, file count, and so on, before performing data integration tasks. This
capability is crucial for conditional operations, like checking if a file
exists before attempting to copy it, or for iterating over files in a
directory.
Definition:
This is an activity in Azure Data Factory, using which you
can get the metadata of any data.
- You can use the output of the Get Metadata
activity in subsequent activity or condition.
- You should have LIST/EXECUTE permission on
the folder while using Get Metadata activity on that folder.
- Wildcard
filters are not supported for files or folders in Get Metadata activity.
Requirement:
-
Create a Get Metadata activity in ADF
Implementation:
- Go to Data Factory and create a new Get Metadata
activity.
- As you can see in the screenshot above, there
are a few things that you need to set up.
- General:
Assign a name and description to the activity.
- Dataset: Choose or create a dataset that
points to the data store you wish to get metadata from.
- Field List: Specify the metadata
fields you want to retrieve. Options include:
- Child Items: Retrieves a list of files and folders
in the specified dataset.
- Exists: Checks if the file/folder exists. When you want to
validate that a file, folder, or table exists, specifyexistsin
the Get Metadata activity field list. You can then check theexists
- true/falseresult in the activity output. Ifexistsis not
specified in the field list, the Get Metadata activity will fail if the object is
not found.
- Size: Obtains the size of the file.
- Last modified: Gets the last
modified timestamp of the file/folder.
- ContentMD5: Provides the MD5 hash of the file
content for Azure Blob Storage.
- Structure: Returns the column structure for
tabular datasets.
- Filter by last modified: The files with
last modified time in the range [Start time, End time) will be filtered for
further processing. These properties can be skipped which means no file
attribute filter will be applied
Note: - If you set up this property, then only the child
items of the selected path will be returned. It will not give metadata from the
subfolder items.
Use the Output in Subsequent Activities: You can reference the output
of the Get Metadata activity in subsequent activities for dynamic behaviors.
For example, you can use an If Condition activity to perform actions based on
whether a file exists.- Example
1: Check if a File Exists Before Copying Use the Get Metadata activity to
check the existence of a file in a source location. If the file exists (Exists
= true), then proceed with a Copy activity to move the file to a destination.
- Example
2: Get List of Files for Processing If you need to process multiple files
in a folder, first use the Get Metadata activity to get a list of files (Child
Items). Then, use a ForEach activity to iterate over these files, processing
each one individually.
- Example
3: Conditional Execution Based on File Size Retrieve the size of a file
using the Get Metadata activity. Then, use an If Condition activity to execute
different branches of your pipeline based on the file size (e.g., only process
files larger than a certain threshold).