DP 900 - Azure Data Analytics

Azure Data Analytics - 27 Cards
Click here to toggle all cards
Data Analytics
Goal - Convert raw data to intelligence
Data Analytics Approach
Ingest, Process, Store (data warehouse or a data lake), Analyze
Data Ingestion
Capture raw data from various sources (stream or batch)
Data Processing
Clean, filter, aggregate, and transform data to prepare for analysis
Data Storage
Store data in a warehouse or lake for easy retrieval
Data Querying
Run queries to analyze the data and gain insights
Data Visualization
Create visualizations to help business spot trends, outliers, and patterns in data
Descriptive analytics
Based on historical/current data, monitor status and generate alerts.
Diagnostic analytics
Take findings from descriptive analytics and dig deeper to understand why something is happening.
Predictive analytics
Predict probability based on historical data to mitigate risk and identify opportunities.
Prescriptive analytics
Use insights from predictive analytics to make data-driven informed decisions.
Cognitive analytics
Combine traditional analytics techniques with AI and ML features to make analytic tools that think like humans.
Big Data - 3Vs
Volume, Variety, Velocity
Data warehouse
PBs of storage and compute, data stored after processing, uses specialized hardware - Azure Synapse Analytics
Data lake
Retains raw data, typically uses object storage, supports ad-hoc analysis - Azure Data Lake Storage Gen2
Star Schema
Data warehouses organize data as Dimensions and Facts. De-normalized and easier to query.
Azure Synapse Analytics
End-to-end analytics solutions with SQL and Spark pools
Azure Data Factory
Fully managed serverless service for ETL and data integration
Azure Power BI
Unify data and create BI reports & dashboards
Azure HDInsight
Managed Apache Hadoop Azure service
Azure Databricks
Managed Apache Spark service
Massive Parallel Processing (MPP)
Split processing across multiple compute nodes - Spark, Azure Synapse Analytics etc
Batch Pipelines
Buffering and processing data in groups. Read from storage (Azure Data Lake Store) and process.
Streaming Pipelines
Real-time data processing
Apache Parquet
Open source columnar storage format. High Compression.
ETL
Extract, Transform, and Load - Retrieve data, process and store it
ELT
Extract, Load, and Transform - Data is stored before it is transformed