AI Instructor Live Labs Included

GCP: Google Cloud AI APIs

Use Google Cloud's pre-trained AI APIs — Vision, Natural Language, Speech-to-Text, and Translation — with simple Python REST calls. No ML knowledge required.

Beginner
9h 45m
6 Lessons

About This Course

Learn to use Google Cloud's suite of pre-trained AI APIs Vision, Natural Language, Speech-to-Text, and Translation using simple Python REST calls. No machine learning knowledge required. You'll make real API calls, build individual wrappers for each service, and combine them into a multi-API content analysis pipeline.

Course Curriculum

6 Lessons
01
AI Lesson
AI Lesson

GCP: Introduction to Google Cloud AI APIs

1h 0m

Discover Google Cloud's suite of pre-trained AI APIs — Vision, Natural Language, Speech-to-Text, and Translation. Learn what each API does, when to use it versus a custom model, how REST calls are structured, and the authentication and quota model.

02
Lab Exercise
Lab Exercise

GCP: Vision and Natural Language APIs in Python

2h 5m 4 Exercises

Make real API calls to the Cloud Vision and Natural Language APIs using Python requests. You'll detect labels and extract text from images, analyze sentiment and entities in text, and combine both APIs to build an image content analyzer.

Detect Labels and Extract Text with the Vision API Make your first Cloud Vision API calls using Python requests. You'll detect labels in an image, extract text with OCR, and learn how to parse the nested JSON response structure. ~20 min
Analyze Sentiment and Extract Entities with the Natural Language API Call the Natural Language API to analyze the sentiment of product reviews and extract named entities from news text. Learn how to interpret score vs. magnitude and sort entities by salience. ~20 min
Detect Language and Translate Text Use the Translation API to detect the language of text samples and translate them to English. Learn the difference between standalone detection and translation-with-detection, and handle already-English inputs efficiently. ~15 min
Combine APIs — Analyze Image Content End to End Chain the Vision and Natural Language APIs together: extract labels and OCR text from an image, detect the text language, translate if needed, then analyze sentiment. Build a function that returns a complete content analysis report. ~25 min
03
AI Lesson
AI Lesson

GCP: Speech-to-Text and Translation APIs

1h 0m

Deep dive into the Speech-to-Text and Translation APIs — how audio encoding works, transcription confidence scores, speaker diarization, language detection, and batch translation. Understand when to use each API tier and how to handle multi-language content.

04
Lab Exercise
Lab Exercise

GCP: Building a Multi-API Content Analyzer

2h 30m 4 Exercises

Build a complete content analysis pipeline that combines the Vision, Natural Language, Speech-to-Text, and Translation APIs. You'll implement individual API wrappers, chain them together, and create a unified analyzer that handles multilingual image and audio content.

Build the VisionAPIClient Wrapper Implement a clean VisionAPIClient class wrapping all Vision API features: label detection, OCR, object localization, and safe search. The class handles authentication and request building so callers only deal with parsed results. ~25 min
Build the NaturalLanguageClient Wrapper Implement NaturalLanguageClient covering sentiment analysis (with per-sentence breakdown), entity extraction (with Wikipedia URLs), and content classification. All methods share a common _post() helper. ~25 min
Implement Speech-to-Text and Translation Wrappers Implement transcribe_audio_url() using base64-encoded audio and the synchronous Speech-to-Text endpoint. Also build detect_language() and translate_to_english() with a short-circuit for already-English text. ~25 min
Build the ContentAnalyzer Multi-API Pipeline Assemble all three API wrappers into ContentAnalyzer — a class that takes an image URL, runs Vision (labels, objects, safe search, OCR), Translation (language detect + translate), and Natural Language (sentiment + entities), and returns a structured report with a flagged field for inappropriate content. ~30 min
05
AI Lesson
AI Lesson

GCP: AI Image Generation — Nano Banana 2 and Pro

45m

Learn how Google's Nano Banana image generation models work — Nano Banana 2 (gemini-3.1-flash-image-preview) for fast, high-volume generation and Nano Banana Pro (gemini-3-pro-image-preview) for maximum quality with built-in thinking. Understand the image generation API, response_modalities, ImageConfig parameters, and when to choose each model.

06
Lab Exercise
Lab Exercise

GCP: Hands-On Image Generation with Nano Banana

2h 25m 4 Exercises

Use the Gemini image generation API in VS Code to generate images from text prompts, edit reference images, build a multi-turn image editing session, and compare the quality output of Nano Banana 2 versus Nano Banana Pro. All exercises use the new google-genai SDK.

Generate Images from Text with Nano Banana 2 Use the Gemini image generation API with Nano Banana 2 (gemini-3.1-flash-image-preview) to generate images from text prompts. You'll install the dependencies, set up your API key, generate your first image, experiment with aspect ratios and sizes, and save images to disk. ~25 min
Edit Images with Reference Input Pass a reference image alongside an edit instruction to Nano Banana 2. You'll load a downloaded reference image, send it to the model with various editing prompts (background change, style transfer, object addition), and compare the edited outputs. ~25 min
Multi-Turn Image Editing with Chat Sessions Use client.chats.create() with an image generation model to build a multi-turn editing workflow. Generate an initial scene, then iteratively refine it across three conversation turns, with each turn building on the model's memory of the previous image. ~25 min
Nano Banana Pro — High-Fidelity Generation and Quality Comparison Switch to Nano Banana Pro (gemini-3-pro-image-preview) and generate the same prompts used in Exercise 1. Compare the outputs side-by-side to see the quality difference, use 4K resolution (Pro-only), and understand when the quality uplift justifies the cost. ~25 min

This course includes:

  • 24/7 AI Instructor Support
  • Live Lab Environments
  • 3 Hands-on Lessons
  • 6 Months Access
  • Certificate of Completion
Category
Skill Level Beginner
Total Duration 9h 45m