In this article we are going to discuss about how to pick and delete only specific files from the ADLS storage container by passing filenames taken from a excel/csv file column value.
File deletion:
Recently I came across a requirement for file deletion in ADLS. Azure Data Factory’s delete activity is enough to complete this task, but the tricky part is the activity should delete only the objects that are present in the excel File_Name column. Let us see how this was achieved with a practical demo.
For this demo I am going to upload few sample files including the ones which I am trying to delete into a folder in my ADLS Gen2 directory. Apart from this I must create a excel file which includes the filenames to be deleted present in column B; like below. One common question that might arrive here is how we can pick exactly the file name column if there are multiple columns present in the excel sheet. This can be achieved using the range option during creation of the source dataset which we will see in the next section.


Steps:
Click and drag Lookup activity into your ADF pipeline canvas which is going to read out the file names that you want to be deleted.

Next go the settings tab and create the dataset which contain the excel file with the filename column. In the source dataset option click on +New and set file type as excel, then you will arrive at the following step.

- Path of the excel file which will function as source with all the filenames to be deleted.
- Select the sheet name if you excel sheet has more than one.
- Range is the one which you can mention to pick the cells starting and ending including the row and column name. In my case if you could scroll up the excel sheet screenshot you can see the filenames are present from column B cell 2 through 10, hence I have marked it as a range. In this way it will read only the specific data ignore all other which we do not need.
In range field, I have limited It to certain rows only as I know which column and up until which row the file names would be and that it will not be changing. In case you are going to append or delete the file names into the excel in future you can leave the range field open as ‘B2’ instead of ‘B2:B9’.
Now after the dataset is set, we are going to pass the output of the lookup activity into the foreach activity. Once under the foreach ‘settingsàitems’ tab enter the following expression for passing the output from lookup activity.
@activity('SourceNameFileLookup').output.value

Inside the foreach activity, create a delete activity and set the dataset pointing towards the folder where you want to delete the files from like below.

So far we have setup pipeline but did not pass the file name from the foreach to the delete activity. With the current setup the delete activity will delete the entire folder containing the files that we are searching for. Hence we have to make sure we have passed the filename variable to the filename parameter in the delete activity to make sure only the files are deleted.
In the dataset used by the delete activity, create a string variable called ‘filename’ and pass it on to the filename field of the dataset File path.

In the source settings of the delete activity, I am calling out the column name of the excel file that we are looking up initially. The column name is ‘Prop_0’ which I chose to leave it as a system generated one, in case your lookup file has headers you can mention in the dataset and call out here.
The pipeline is now ready, lets run it to see if we are getting the files deleted.

The pipeline has been triggered and completed successfully.

We can see it has deleted the files compared to the input present in the excel column and below are the only files those are left after the pipeline activity run is successful.

Most or almost all of the projects has this requirement to clean-up specific stale files for reduce cloud cost incurred from the storage blob. Removing them based on name/type by referring them directly from a file rather than input them manually is time saving and can be automated as well. I hope these steps would come handy for such scenarios, let me know for any questions.