AI-200: Developing AI Cloud Solutions on Azure for Python Developers
Build production-ready Python AI back-ends on Azure across 12 hands-on labs: containers (ACR, App Service, Container Apps, AKS), data services with vector search (Cosmos DB, PostgreSQL pgvector, Managed Redis), eventing (Service Bus, Event Grid, Functions), and security + observability (Key Vault, App Configuration, OpenTelemetry, KQL). Capstone integrates everything into the Northwind Logistics back-end. Prepares you for Microsoft Exam AI-200.
View badge details
Exam Preparation Included
Practice with real exam-style questions for the AI-200 certification. AI-powered feedback helps you understand every answer.
About This Course
Course Curriculum
25 Lessons
AI-200 Course Intro: Architecture & Python on Azure
Welcome to AI-200. In this opening lesson you meet Northwind Logistics — a fictional global freight company that you, the new AI platform engineer, will modernize across the entire course. You learn the AI-200 exam blueprint (the four skill domains and their weightings), the canonical Northwind AI back-end architecture you'll build piece by piece, the Azure-Python SDK landscape (azure-identity, azure-keyvault-secrets, azure-cosmos, azure-servicebus, azure-eventgrid, azure-monitor-opentelemetry, etc.), and the development pattern that runs through every lab (DefaultAzureCredential vs ManagedIdentityCredential, async clients, retry-and-backoff). This is a teaching-only lesson — no exercise lab follows.
Build, Tag & Push Container Images with Azure Container Registry
Master the first AI-200 containers objective: build, store, version, and manage container images with Azure Container Registry. You learn the ACR resource model (registry → repository → manifest), tagging strategies (immutable digests vs floating tags), the difference between `docker push` and `az acr build` (ACR Tasks), how to wire ACR Tasks to a Git source so every commit triggers a build, multi-arch base images, and content trust. You also learn how to authenticate to ACR — admin user, token, service principal, and (correct answer in production) managed identity with AcrPull/AcrPush.
Build, Tag & Push Container Images with ACR - Lab Exercises
Containerize the Northwind Logistics shipment-classifier FastAPI service, build it locally and again via ACR Tasks, push it to your Container Registry with immutable digest tags, and pull it back in a quick smoke test. You finish the lab by setting up a build trigger so every push to the starter repo re-tags `latest`.
Deploy Containerized AI APIs to Azure App Service
Learn how to deploy a container image to Azure App Service for Linux — when App Service is the right host (single container, web-scale HTTPS, sticky sessions, slot-based deploys), the configuration surface (`WEBSITES_PORT`, `WEBSITES_ENABLE_APP_SERVICE_STORAGE`, `DOCKER_REGISTRY_SERVER_URL`), the difference between App Settings and Connection Strings, how to inject Key Vault secrets via `@Microsoft.KeyVault(...)` references, and how to flip between staging and production with deployment slots without dropping traffic.
Deploy Containerized AI APIs to Azure App Service - Lab Exercises
Deploy the Northwind shipment-classifier image from your ACR to an Azure App Service for Linux, wire it up with managed-identity ACR pull, inject configuration via App Settings + a Key Vault reference, then ship a new revision into a staging slot and swap it into production without dropping traffic.
Azure Container Apps + KEDA Event-Driven Scaling
Learn Azure Container Apps — the serverless container platform built on Kubernetes + Dapr + KEDA — when ACA is the right choice (microservices, event-driven workers, scale-to-zero), the resource model (Environment → App → Revision), how revisions and traffic-splits work, Dapr sidecars, ingress configuration, and (the AI-200 focus) **KEDA event-driven autoscaling** — HTTP scalers, Service Bus queue scalers, Event Grid scalers, custom scaler authentication, and minReplicas/maxReplicas tuning.
Azure Container Apps + KEDA Event-Driven Scaling - Lab Exercises
Deploy the Northwind shipment-classifier as an Azure Container App with HTTP autoscaling, then add a second container app — a Service Bus queue worker that scales 0→20→0 based on queue depth using a KEDA Service Bus scaler. Use a load-generator script to verify cold-start to peak to scale-down.
AKS Manifest Deploys & Troubleshooting
Learn how to deploy and manage applications on Azure Kubernetes Service using manifest files — Pod/Deployment/Service/Ingress YAML, ConfigMap and Secret injection, the AKS authentication chain (kubelogin + Entra ID), Workload Identity for pulling Cosmos DB / Service Bus keys without secrets, image pull from ACR via AcrPull, and the troubleshooting surface: `kubectl describe pod`, `kubectl logs --previous`, events, end-to-end connectivity probes, and how the AI-200 exam frames "why is the pod ImagePullBackOff?" or "why does my service return 502?"
AKS Manifest Deploys & Troubleshooting - Lab Exercises
Deploy the Northwind shipment-classifier to an AKS cluster using Deployment + Service + Ingress manifests, attach a Workload Identity that pulls a secret from Key Vault, then deliberately break the deployment three times and use the kubectl troubleshooting toolkit (describe, logs, events, exec) to diagnose ImagePullBackOff, CrashLoopBackOff, and a 502 ingress failure.
Cosmos DB for NoSQL: Queries, Vector Search & Change Feed
Master the Cosmos DB for NoSQL fundamentals AI-200 expects you to know cold: the resource hierarchy (account → database → container → item), how to connect and run queries with the Python SDK, partition-key design, the RU/s pricing model, how to read and optimize query RU consumption with indexing policies (included/excluded paths, composite indexes), and how to pick the right consistency level (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual) for your workload.
Cosmos DB for NoSQL: Queries, Vector Search & Change Feed - Lab Exercises
Build a Python shipment-lookup service against Cosmos DB for NoSQL. Run a poorly-indexed query, inspect its RU cost, fix it with an indexing policy change and verify the RU drop. Then flip the account between Session and Strong consistency and measure the latency impact on a multi-region read pattern.
Azure Database for PostgreSQL + pgvector for RAG
Master the AI-200 PostgreSQL track. You learn how to connect from Python with psycopg + pgvector, how to model schemas and pick data types for AI workloads (jsonb for metadata, vector(1536) for embeddings, tsvector for hybrid keyword+vector), indexing strategies (HNSW vs IVFFlat, when each wins), how to size Flexible Server compute/memory/storage for a vector workload, and the RAG pattern with metadata filters (`WHERE tenant = $1 AND embedding <=> $2 < 0.3`). Connection-optimization (PgBouncer transaction pooling, async psycopg, prepared statements) closes the lesson.
Azure Database for PostgreSQL + pgvector for RAG - Lab Exercises
Build a multi-tenant pgvector RAG retriever for Northwind. Provision the pgvector extension, create the schema with a metadata-filtered HNSW index, embed and ingest 500 customer-service knowledge-base articles, then write a Python RAG handler that retrieves by tenant + cosine distance and hands top-k chunks to Azure OpenAI for the final answer.
Azure Managed Redis — Caching + Vector Indexing
Learn Azure Managed Redis end-to-end. Cache fundamentals (GET/SET, TTL with EX, atomic INCR, the cache-aside vs read-through patterns, expiration vs eviction, manual invalidation), then the AI-200 vector angle — RediSearch FT.CREATE with VECTOR fields, HNSW vs FLAT, and KNN search via FT.SEARCH. Wrap up with sizing (Enterprise vs Memory-Optimized tiers), high availability, and the right Python client (redis-py with hiredis).
Azure Managed Redis — Caching + Vector Indexing - Lab Exercises
Cache Northwind shipment-status lookups in Azure Managed Redis with a 60-second TTL and a manual invalidation hook, then add a second Redis use case — a vector index over recent driver chat messages — that returns the top three semantically similar messages with FT.SEARCH KNN.
Service Bus & Azure Functions
Learn Azure Service Bus for back-end messaging. Queues vs topics+subscriptions, the Standard vs Premium tiers, message vs session, peek-lock vs receive-and-delete, the duplicate-detection window, scheduled and deferred messages, and (heavily tested on AI-200) the **dead-letter queue** — how messages end up there, how to read them, how to resubmit, and how to set up a dead-letter exception filter on a subscription so poison messages never re-enter the main queue.
Service Bus & Azure Functions - Lab Exercises
Wire a Service Bus topic into the Northwind shipment pipeline. Producers publish `shipment-events` with `eventType` properties; two subscriptions filter on `eventType = "DELAYED"` and `eventType = "DELIVERED"`. Deliberately publish a poison message, watch it land in the DLQ after MaxDeliveryCount, then write a small DLQ-resubmit tool.
Event Grid — Filters, Custom Events & Retries
Learn Azure Event Grid for event-driven workflows. The producer side: system topics (Blob, Cosmos, Resource Manager), custom topics, and the CloudEvents 1.0 envelope. The router side: subject/event-type filters, advanced JSON-path filters (`data.severity > 3`), and dead-lettering to a Storage container. The handler side: webhook delivery, retry policy (exponential backoff up to 24h), event acknowledgment, and Event Grid's at-least-once delivery guarantee.
Event Grid — Filters, Custom Events & Retries - Lab Exercises
Publish Northwind custom events (`Northwind.Shipment.Reweighed`) to an Event Grid topic, set up two subscriptions with subject + advanced filters, force a handler failure and observe retry + dead-letter delivery to a Storage container, then fix the handler and replay the dead-lettered events.
Secure Secrets & Config — Key Vault + App Configuration
Lock down the Northwind AI back-end. Key Vault: secret vs key vs certificate, soft-delete and purge protection, RBAC vs access policies (Azure recommends RBAC), `SecretClient` from azure-keyvault-secrets, secret rotation strategies (manual, Event Grid-driven, managed rotation for storage/SQL), and the App Service `@Microsoft.KeyVault` reference pattern. App Configuration: the central config store, labels (dev/test/prod), feature flags, dynamic refresh with `AzureAppConfigurationProvider`, and the integration with Key Vault references.
Secure Secrets & Config — Key Vault + App Configuration - Lab Exercises
Move every secret out of the Northwind classifier code into Azure Key Vault. Wire the App Service slot up with a system-assigned managed identity that has `Key Vault Secrets User` role, switch app settings to `@Microsoft.KeyVault(...)` references, rotate a secret on-demand, then put feature flags + per-environment config in Azure App Configuration with dynamic refresh.
Monitor & Troubleshoot — OpenTelemetry + KQL
Master the AI-200 observability track. **OpenTelemetry SDK for Python** — `azure-monitor-opentelemetry` one-line `configure_azure_monitor()`, automatic instrumentation for FastAPI / requests / azure-* SDKs, manual span creation with `tracer.start_as_current_span`, propagating context across Service Bus and HTTP, and shipping to Application Insights. **KQL** — the `traces`, `requests`, `dependencies`, `exceptions` tables, joining by `operation_Id`, the time-window operators, `summarize percentile(...)`, and how to write the five canonical AI-200 KQL questions ("p95 latency by endpoint", "which dependency call is slowest", "error rate by status code", "which user saw the most 500s").
Monitor & Troubleshoot — OpenTelemetry + KQL - Lab Exercises
Instrument the Northwind shipment-classifier with OpenTelemetry, propagate trace context across an HTTP call into a Service Bus consumer, then write five KQL queries in App Insights that answer the canonical AI-200 troubleshooting questions (p95 latency by endpoint, slowest dependency, error rate by status code, failed-trace replay, custom dimension breakdown).
Capstone — Northwind Logistics End-to-End AI Cloud Solution
Capstone teaching brief. You consolidate every AI-200 domain into a single architectural narrative — the Northwind Logistics AI cloud back-end. The teaching agent walks you through the target architecture, the rubric, the three feature requests Northwind's product team is making, and the exam-style design decisions you'll have to defend ("why Container Apps for X but App Service for Y", "why pgvector for support-articles but Cosmos vector for incidents", "why Service Bus for command flow but Event Grid for fan-out").
Capstone — Northwind Logistics End-to-End AI Cloud Solution - Lab Exercises
Build the end-to-end Northwind AI back-end: a Container Apps-hosted FastAPI service that classifies shipment events with Azure OpenAI, persists them to Cosmos DB with vector embeddings, retrieves prior similar incidents with pgvector, caches hot lookups in Managed Redis, fans out events through Service Bus + Event Grid, runs an Azure Function for after-hours summarization, locks every secret in Key Vault, and ships OpenTelemetry traces to Azure Monitor. The agent reviews your final solution against the AI-200 rubric.