Chapter 13: Vertex AI (Google's VO3) Integration for Text-to-Video¶

Video: Watch this chapter on YouTube (2:43:10)

Overview¶

This chapter demonstrates how to access Google's VO3 video generation model directly through Vertex AI on Google Cloud Platform, as an alternative to using third-party platforms like WaveSpeed. This approach provides direct access to Google's infrastructure with full control over API calls.

Detailed Summary¶

Why Direct Vertex AI Access?¶

While platforms like WaveSpeed AI provide convenient unified APIs, direct Vertex AI access offers:

Direct Google infrastructure: No intermediary
Full API control: All parameters available
Potentially lower costs: No platform markup (though still expensive)
GCS integration: Output directly to Google Cloud Storage

Prerequisites¶

Google Cloud Platform account
Billing enabled
Vertex AI API enabled
OAuth2 credentials configured

Workflow Architecture¶

Manual Trigger → VO3 POST (Vertex AI) → Wait → Poll (GET result) → If Loop → Convert → Output

Step 1: Manual Trigger¶

Start with a simple Manual Trigger for testing the API connection.

Step 2: Understanding the API Endpoint¶

From Vertex AI documentation (cloud.google.com/vertex-ai), the endpoint format is:

https://us-central1-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/us-central1/publishers/google/models/{MODEL_ID}:generateVideo

Parameters to Configure¶

PROJECT_ID: Your Google Cloud project ID (not project number)
MODEL_ID: e.g., veo-3-fast-generate-0.1

Step 3: OAuth2 Authentication Setup¶

Vertex AI requires Google OAuth2, not simple API keys.

Configuring Credentials in n8n¶

HTTP Request node → Authentication
Select Predefined Credential Type
Choose Google OAuth2 API

Creating OAuth2 Credential¶

Create new credential
Need from Google Cloud Console:
Client ID
Client Secret

Important: Add scope for cloud platform access:

https://www.googleapis.com/auth/cloud-platform

Enabling Vertex AI API¶

Go to Google Cloud Console
Search "Vertex AI API"
Click Enable

Complete OAuth Flow¶

Click "Sign in with Google"
Select account
Grant permissions
See "Connection successful"

Step 4: Configure POST Request¶

URL Configuration¶

Full URL with your project ID:

https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/us-central1/publishers/google/models/veo-3-fast-generate-0.1:generateVideo

JSON Body¶

Use raw JSON body format:

{
  "instances": [
    {
      "prompt": "A serene walk down the beach at sunset, gentle waves lapping the shore"
    }
  ],
  "parameters": {
    "aspectRatio": "16:9",
    "durationSeconds": 8,
    "sampleCount": 1
  }
}

Notes: - durationSeconds must be 8 (VO3 requirement) - Remove outputGcsUri to receive Base64 response instead of GCS storage

Step 5: Wait Node¶

Add Wait node
Set 15 seconds
Video generation takes time

Step 6: Poll for Results¶

Understanding the Response¶

The POST returns an operation name that must be polled.

GET Request Configuration¶

Add HTTP Request node
Method: POST (for polling)
URL: Long-running operation endpoint
Body: Include operation name from POST response

{
  "name": "{{ $node['VO3 POST'].json.name }}"
}

Same OAuth2 Authentication¶

Use the same Google OAuth2 credential.

Step 7: If Loop for Status Check¶

Add If node
Condition type: Boolean
Check: done equals true
True branch: Continue to processing
False branch: Wait 15s → Loop back to poll

Step 8: Handling Base64 Response¶

Without GCS storage, Vertex AI returns video as Base64.

Extract to Field¶

Add Set/Edit Fields node
Create field b64
Value: The Base64 video data from response

Convert Base64 to File¶

Add Convert node
Operation: Move Base64 to file
Input: The b64 field
Output: Binary file data

Step 9: Download/Deliver¶

Options for the final output:

Download directly: For testing
Upload to Drive: Store for later use
Send via Telegram/Slack: Direct delivery
Email with attachment: Gmail with binary attachment

GCS Alternative¶

For production, using Google Cloud Storage is recommended:

Create GCS bucket

Include in API request:

{
  "outputGcsUri": "gs://your-bucket/output/"
}

Video saves directly to GCS
Retrieve via GCS API or public URL

Cost Comparison: Direct vs WaveSpeed¶

Factor	Vertex AI Direct	WaveSpeed
Setup complexity	Higher	Lower
Control	Full	Limited
Multi-model access	Google only	Many providers
Pricing	Google rates	Platform markup
Integration ease	OAuth required	API key

When to Use Each Approach¶

Use Vertex AI Direct when: - Already in Google Cloud ecosystem - Need maximum control - Want GCS integration - Building enterprise solutions

Use WaveSpeed when: - Need multiple model providers - Want simple API key auth - Rapid prototyping - Cost comparison shopping

Key Takeaways¶

Direct Vertex AI access is possible: n8n can call Google's APIs directly.
OAuth2 is required: Unlike simple API keys, Google Cloud needs OAuth authentication.
Scope configuration is critical: The cloud-platform scope must be added.
Base64 without GCS: Remove storage URI to receive video as Base64 data.
8-second duration required: VO3 text-to-video requires specific duration.
Polling pattern still applies: Operation name returned, poll for completion.
Boolean status check: Vertex AI uses done: true instead of "completed".
Conversion needed for Base64: Additional nodes required to handle Base64 output.
GCS simplifies production: Direct storage avoids Base64 handling.
Platform choice depends on use case: Direct access vs unified platform is a tradeoff.

Conclusion¶

Direct Vertex AI integration demonstrates n8n's flexibility in connecting to any API, even complex OAuth-authenticated Google Cloud services. While this approach requires more setup than using a platform like WaveSpeed, it provides complete control and native Google Cloud integration. The pattern of OAuth authentication, async polling, and result handling applies to many Google Cloud AI services beyond video generation. For organizations already invested in Google Cloud, this direct approach may be preferable; for others, the simplicity of unified platforms justifies any additional cost. Both approaches produce the same high-quality VO3 output—the choice depends on infrastructure preferences and integration requirements.