by

Azure Stream Analytics

Azure Stream Analytics is a fully managed PaaS (Platform-as-a-Service) and a real-time streaming service provided by Microsoft. It consists of a complex event processing engine designed to analyze and process vast volumes of real-time data like stock trading, credit card fraud detection, Web clickstream analysis, social media feeds & other applications. For quicker analysis of data, you can either execute them in batch mode or in a real-time way.

Like many other services in Azure, Stream Analytics is best used with other services to create a larger end-to-end solution. One best example you can keep in mind is one can run real-time machine learning analysis on the credit card to detect the fraud immediately, to prevent the card’s misuse again in the future.

You have incoming live stream of data and you want to store, build a report dashboard on it with the help of Power BI, or to get insights from those data by simply transforming it, Azure Stream Analytics will be your best choice. Since it’s a PaaS service as mentioned earlier, it is a perfect solution if you want a fully managed service where you don’t have to worry about any infrastructure setup and moreover you have to pay only for what you use.

Azure Stream Analytics – Overall Picture

Azure Stream Analytics use cases

  • Real-time dashboard with Power BI for monitoring purpose
  • Storing streaming data in cold path and making it available for other services and further analysis, logging, reporting etc.
  • Transforming and analysis of data in real-time
  • Trigger workflows with conditions (for ex. running Azure Functions from Stream Analytics job)
  • Sending alerts
  • Machine learning projects like risk analysis, predictive maintenance, fraud detection, trend prediction etc.

The good thing to know is you don’t have to be an expert in programming as Azure Stream Analytics job is declarative only. Azure Stream Analytics can be used if the input data is in AVRO, JSON or CSV format and the application logic can be programmed in a query language like SQL.

Ease of use

Azure Stream Analytics is very easy to start. It only takes a few clicks to connect to multiple sources and sinks, creating an end-to-end pipeline. Stream Analytics can connect to Azure Event Hubs and Azure IoT Hub for streaming data ingestion, as well as Azure Blob storage to ingest historical data. You can also route the output to many systems such as Azure Blob storage, Azure SQL database, Azure Data Lake and CosmosDB. You can also run batch analytics on stream outputs with Azure Synapse Analytics or HDInsight, or you can send the output to another service, like Event Hubs for consumption or Power BI for real-time visualization.

Key Capabilities

Azure Stream Analytics is designed to be easy to use, flexible, reliable, and scalable to any job size. It is available across multiple Azure regions and can run on IoT Edge or Azure Stack.

Real time dashboard

With Azure Stream Analytics, you can quickly stand up real-time dashboards and alerts. A simple solution ingests events from Event Hubs or IoT Hub, and feeds the Power BI dashboard with a streaming data set.

Reliability

Azure Stream Analytics guarantees exactly once event processing and at-least-once delivery of events, so events are never lost. Exactly once processing is guaranteed with selected output as described in Event delivery guarantees. Azure Stream Analytics has built-in recovery capabilities in case the delivery of an event fails. Stream Analytics also provides built-in checkpoints to maintain the state of your job and provides repeatable results.

As a managed service, Stream Analytics guarantees event processing with a 99.9% availability at a minute level of granularity.

Performance

Azure Stream Analytics can process millions of events every second seamlessly and even with a ultra-low latency environment. It allows you to scale-up and scale-out to handle large real-time and complex event processing applications. Azure Stream Analytics supports higher performance by partitioning, allowing complex queries to be parallelized and executed on multiple streaming nodes.

Limitation

  • It only supports SQL (you are limited to SQL-possible-transformation)
  • Your input data needs to be AVRO, JSON or CSV
  • You can only use blob storage to add static data
  • You can only integrate with Azure services
  • You can’t benefit from support dynamic reference data join
  • There is no automatic scaling (scale job in Azure Portal)

Write a Comment

Comment