As we all know that data is the new oil in the world, but it is more than that. The data projection and insights generated can make or break a company’s prospects. Every organization will face challenges in some form in any or all the below actions.
- Acquiring / data procurement
- Storing and archiving the data / Warehousing
- Transforming into insights / ETL
These three are very important and basic responsibility of any Database/BI team in a company. The data they get will be from disparate sources, it should be made sure to be integrated and meaningful transformation has been made. The visual insights obtained after transformation will help the management decide the strategies and set achievable goals.
How useful in real-time scenario?
For example, HSK Ltd is one of the largest grocery retail stores. The company obviously will try to collect terabytes of data produced by the purchases in the stores and wants to analyze them to gain insights into customer preferences, demographics, and behavior. This will help them server the products based on target audience which can drive business and makes customers happy. For this the company needs to cross reference data like customer details which is stored in on-premises data store and must be combined with our collected log data in cloud data store. For the main part, to gain insights it must process the joined data and publish the transformed data using few other azure services like Azure synapse and HDInsight and then build report on top of it. This can be scheduled to run on daily basis as well.
All of these can be taken care easily by azure data factory. The best thing is all of these can be achieved without any requirement of coding as part of code-free ETL as a service and is serverless!

ADF Functions:
Collect – The primary step in building a system is by acquiring data from different sources to process them.
Transform – After data is available in the cloud datastore transformation is initiated. One can prefer to code or use azure tools to build and maintain data flows.
Monitor – To monitor the scheduled activities and pipelines. Many built-in supports via azure monitor, PowerShell, Azure monitor logs etc are available.

Top Level Concepts:
- Pipelines – Pipeline is single unit of larger part of work, together which performs a task
- Activities – In simple words activity is a process step which we configure for completing a task, these are the real actions which we expect. Activities can take an input and produce our desired outcome as a dataset
- Datasets – This points out the data we want to use for the activity as input/output
- Linked services – Linked services are similar to connection strings. They hold the connection information for the Azure data factory to connect the external sources
- Triggers – Triggers act like a scheduler and makes sure the execution process step is started
Conclusion
This is an introduction to Azure Data Factory. In the upcoming articles we will look more into practical and real-time activities.