Chapter 10: Text-to-Image Workflow with AI Prompt Generation¶

Video: Watch this chapter on YouTube (2:03:40)

Overview¶

This chapter guides you through building a complete text-to-image workflow. The workflow accepts simple text descriptions, uses an AI agent to create optimized image prompts, sends requests to WaveSpeed AI's image generation API, and delivers results via email.

Detailed Summary¶

Workflow Architecture¶

The complete workflow consists of:

Chat Trigger: Accepts user's image description
Image Prompt Agent: OpenAI-powered prompt engineer
WaveSpeed POST: Sends image generation request
Wait Node: Allows processing time
WaveSpeed GET: Retrieves generated image
If Loop: Ensures image is ready
Gmail: Delivers image link

About WaveSpeed AI¶

WaveSpeed AI is a platform hosting various video and image generation models: - Supports models like VO3, Seedance, Flux, and more - Paid platform: Minimum $10 top-up required for API access - Provides unified API for multiple generation models

Step 1: Chat Trigger Setup¶

Add Chat Trigger node
Type a simple image description:
Example: "Create an image of a cat flying through hoops of rainbows"
Execute to populate trigger data

Step 2: Image Prompt Generation Agent¶

Users provide simple descriptions, but image models need detailed prompts. An AI agent bridges this gap.

Node Setup¶

Add OpenAI node (Message Model)
Rename: "Image Prompt Generation AI"
Connect OpenAI account
Select model: GPT-4.1

User Prompt Configuration¶

Drag chatInput from trigger into prompt field
This passes the user's simple description

System Prompt (AI-Generated)¶

Use ChatGPT to create an effective system prompt for a prompt engineer:

You are an expert text-to-image generation prompt engineer working inside an n8n automation.

Your only task is to generate clear, vivid, and effective prompts to be passed to an image generation API.

Guidelines:
- Describe visual elements in detail (subject, action, setting)
- Include artistic style (digital art, photography, illustration)
- Specify lighting and mood
- Add composition details
- Keep prompts concise but descriptive
- Output ONLY the prompt, no explanations

Example Transformation¶

User input: "Create an image of a cat flying through hoops of rainbows"

AI-generated prompt: "A playful cat soaring gracefully through vibrant glowing hoops made of rainbows suspended in the sky, surrounded by fluffy clouds, dynamic motion, bright and whimsical lighting, magical atmosphere, highly detailed digital art, fantasy illustration style"

Step 3: WaveSpeed API POST Request¶

Selecting the Model¶

Go to WaveSpeed AI → Explore Models
Filter by "text-to-image"
Select model (e.g., "Seedream by ByteDance")
Go to API documentation

Configuring the HTTP Request¶

Add HTTP Request node
Rename: "WaveSpeed Post"
Copy cURL command from WaveSpeed docs
Click Import cURL

Authentication Setup¶

Authentication → Generic Credential Type → Header Auth
Create new credential:
Name: "WaveSpeed Credential Demo"
Header Name: Authorization
Value: Bearer [API_KEY] (get from WaveSpeed dashboard)
Save and toggle off manual headers

Body Configuration¶

In the JSON body, replace the prompt field: 1. Drag the content output from Image Prompt Agent 2. Other settings (aspect ratio, size) can be left as defaults

Execute and Pin¶

Execute step to send request
Important: Pin the data to avoid repeated API charges

Step 4: Wait Node¶

Add Wait node
Set: 15 seconds
Purpose: Allow image generation to complete

Step 5: WaveSpeed API GET Request¶

Setup¶

Add another HTTP Request node
Rename: "WaveSpeed Get"
Import GET cURL from WaveSpeed docs

Configure Request ID¶

Toggle URL to expression mode
Keep base URL, add / then drag the id from Wait node
This retrieves the specific image request

Authentication¶

Use the same WaveSpeed credential created earlier.

Execute and Review¶

Output includes: - status: "completed" when ready - output: URL link to the generated image

Step 6: If Loop for Status Check¶

Images may take longer than 15 seconds. An If loop prevents errors.

Setup¶

Add If node
Condition: Drag status from GET request
Type: String
Operator: Equals
Value: completed

True Branch (Image Ready)¶

Connect to output node (Gmail in this case).

False Branch (Still Processing)¶

Add Wait node (15 seconds)
Rename: "Wait another 15s"
Connect back to WaveSpeed GET node

This creates a loop that keeps checking until the image is ready.

Step 7: Gmail Output¶

Configuration¶

Add Gmail node to True branch
Connect Gmail account
Settings:
To: Your email address
Subject: "Image Generated" + timestamp variable
Email Type: Text
Message: Drag output (image URL) from If node

Remove Attribution¶

Under Options: 1. Add "Append n8n Attribution" 2. Toggle OFF

Test¶

Execute the Gmail step
Check email inbox
Click the link to view generated image

Complete Workflow Flow¶

Chat Trigger (user input)
     ↓
Image Prompt Agent (enhance prompt)
     ↓
WaveSpeed POST (request generation)
     ↓
Wait 15 seconds
     ↓
WaveSpeed GET (check result)
     ↓
If (status == completed)
   ├── True → Gmail (send image link)
   └── False → Wait 15s → Loop to GET

Production Considerations¶

Pin POST data: Avoid regenerating during testing
Adjust wait times: Based on model speed
Error handling: Add additional status checks
Cost awareness: Each generation consumes API credits
Output flexibility: Replace Gmail with Telegram, Slack, etc.

Key Takeaways¶

AI prompt engineering improves results: Transform simple descriptions into detailed, effective prompts.
System prompts are crucial: Use LLMs to generate specialized prompt engineering instructions.
WaveSpeed provides unified API: One platform for multiple image generation models.
Async pattern applies here: POST to start, GET to retrieve, loop until complete.
Pin data saves money: Image generation has real costs—pin during development.
If loops prevent failures: Never assume the image is ready—always check status.
Wait nodes are essential: Give external APIs time to process.
Output nodes are interchangeable: Gmail, Slack, Telegram, or any other channel works.
Dynamic variables in requests: Use expressions to pass prompt content and request IDs.
Loop back for retries: Connect false branch to repeat the check until successful.

Conclusion¶

The text-to-image workflow demonstrates the power of combining AI prompt engineering with image generation APIs. The pattern established here—prompt enhancement, async API calls, status polling, and conditional delivery—applies to many creative automation scenarios. The AI prompt agent transforms casual descriptions into optimized prompts that significantly improve image quality. This same architecture extends to video generation (covered next), with longer wait times and additional parameters. Understanding this workflow prepares learners for increasingly complex multimodal automations involving text, images, and video generation.