The For Each activity
in Azure Data Factory is designed to create a loop in your pipeline, allowing
you to iterate over a collection and execute specified activities for each item
in that collection. This is like the foreach loop structure in many programming
languages. It is configurable for both sequential and parallel executions,
supporting dynamic activity within loops by referencing the current item.
ADF – What is ForEach activity?
Definition:
You use this activity
in ADF when you want to loop through the item collection and implement
something for each element in the loop. This works the same way it does in
programming.
- Iteration
over Collections: The
primary use of the ForEach activity is to iterate over a collection of items.
This collection can be the result of a previous activity in the pipeline, such
as a Lookup activity that retrieves a list of files to process, database
records, or any other array of items.
- Dynamic
Activity Execution: Within
each iteration of the loop, you can execute one or more activities. This is
useful for scenarios where you need to perform repetitive tasks on each item in
a collection, such as processing files, executing SQL commands for each record,
or calling APIs with different parameters.
- Sequential
or Parallel Execution: The
ForEach activity allows you to control whether the iterations are executed
sequentially or in parallel. Parallel execution can significantly reduce the
overall processing time but may require careful consideration of resource
utilization and potential throttling limits of the data sources or services
being accessed.
- Batch
Count: When executing
activities in parallel, you can specify the batch count, which determines how
many iterations of the loop can run in parallel. This allows for a balance
between performance and resource consumption.
- Integration
with Other Activities: The
output of other activities can be passed into the ForEach loop as an array, and
the actions within the loop can use these items. For example, you can use the
output of a Lookup activity to fetch a list of items to be processed by the
ForEach activity.
- Handling
Dynamic Content: Items
within the collection being iterated over can be referenced dynamically within
the activities executed in the loop. This allows for customized processing of
each item based on its content.
ADF – ForEach
activity implementation
Requirement:
- Use Lookup activity on SharePoint library
and pull the list of files from it.
- Apply ForEach activity and perform copy
file operation on each file in the collection from SharePoint library to BLOB
container.
Implementation:
- Add Lookup activity in ADF. From the
Settings tab, select Source Dataset as SharePoint Library.
Note: For demo purpose only, we have kept this step to select SharePoint
Library. In real scenario, it can be anything.
- Now, we will get the list of files from this
library as the output. We need to copy each file from SharePoint to BLOB
container. So, for that purpose we will use ForEach activity now.
- ADD Foreach activity in ADF. As you can see in
the screenshot, I have added two activities inside the Foreach.
- Click on the Edit button from ForEach activity.
- Now inside the Foreach, you can define different
activities, which will run for each loop.
- As you can see in the screenshot, I have added
two activities. 1) Web (GetBearerToken), 2) Copy Data.
- Both these activities will get executed, each
time the loop runs. We will not go into the details of either Web or Copy Data
activity because they are not part of the agenda of this article.