Chapter 10: Text-to-Image Workflow with AI Prompt Generation¶
Video: Watch this chapter on YouTube (2:03:40)
Overview¶
This chapter guides you through building a complete text-to-image workflow. The workflow accepts simple text descriptions, uses an AI agent to create optimized image prompts, sends requests to WaveSpeed AI's image generation API, and delivers results via email.
Detailed Summary¶
Workflow Architecture¶
The complete workflow consists of:
- Chat Trigger: Accepts user's image description
- Image Prompt Agent: OpenAI-powered prompt engineer
- WaveSpeed POST: Sends image generation request
- Wait Node: Allows processing time
- WaveSpeed GET: Retrieves generated image
- If Loop: Ensures image is ready
- Gmail: Delivers image link
About WaveSpeed AI¶
WaveSpeed AI is a platform hosting various video and image generation models: - Supports models like VO3, Seedance, Flux, and more - Paid platform: Minimum $10 top-up required for API access - Provides unified API for multiple generation models
Step 1: Chat Trigger Setup¶
- Add Chat Trigger node
- Type a simple image description:
- Example: "Create an image of a cat flying through hoops of rainbows"
- Execute to populate trigger data
Step 2: Image Prompt Generation Agent¶
Users provide simple descriptions, but image models need detailed prompts. An AI agent bridges this gap.
Node Setup¶
- Add OpenAI node (Message Model)
- Rename: "Image Prompt Generation AI"
- Connect OpenAI account
- Select model: GPT-4.1
User Prompt Configuration¶
- Drag
chatInputfrom trigger into prompt field - This passes the user's simple description
System Prompt (AI-Generated)¶
Use ChatGPT to create an effective system prompt for a prompt engineer:
You are an expert text-to-image generation prompt engineer working inside an n8n automation.
Your only task is to generate clear, vivid, and effective prompts to be passed to an image generation API.
Guidelines:
- Describe visual elements in detail (subject, action, setting)
- Include artistic style (digital art, photography, illustration)
- Specify lighting and mood
- Add composition details
- Keep prompts concise but descriptive
- Output ONLY the prompt, no explanations
Example Transformation¶
User input: "Create an image of a cat flying through hoops of rainbows"
AI-generated prompt: "A playful cat soaring gracefully through vibrant glowing hoops made of rainbows suspended in the sky, surrounded by fluffy clouds, dynamic motion, bright and whimsical lighting, magical atmosphere, highly detailed digital art, fantasy illustration style"
Step 3: WaveSpeed API POST Request¶
Selecting the Model¶
- Go to WaveSpeed AI → Explore Models
- Filter by "text-to-image"
- Select model (e.g., "Seedream by ByteDance")
- Go to API documentation
Configuring the HTTP Request¶
- Add HTTP Request node
- Rename: "WaveSpeed Post"
- Copy cURL command from WaveSpeed docs
- Click Import cURL
Authentication Setup¶
- Authentication → Generic Credential Type → Header Auth
- Create new credential:
- Name: "WaveSpeed Credential Demo"
- Header Name:
Authorization - Value:
Bearer [API_KEY](get from WaveSpeed dashboard) - Save and toggle off manual headers
Body Configuration¶
In the JSON body, replace the prompt field:
1. Drag the content output from Image Prompt Agent
2. Other settings (aspect ratio, size) can be left as defaults
Execute and Pin¶
- Execute step to send request
- Important: Pin the data to avoid repeated API charges
Step 4: Wait Node¶
- Add Wait node
- Set: 15 seconds
- Purpose: Allow image generation to complete
Step 5: WaveSpeed API GET Request¶
Setup¶
- Add another HTTP Request node
- Rename: "WaveSpeed Get"
- Import GET cURL from WaveSpeed docs
Configure Request ID¶
- Toggle URL to expression mode
- Keep base URL, add
/then drag theidfrom Wait node - This retrieves the specific image request
Authentication¶
Use the same WaveSpeed credential created earlier.
Execute and Review¶
Output includes:
- status: "completed" when ready
- output: URL link to the generated image
Step 6: If Loop for Status Check¶
Images may take longer than 15 seconds. An If loop prevents errors.
Setup¶
- Add If node
- Condition: Drag
statusfrom GET request - Type: String
- Operator: Equals
- Value:
completed
True Branch (Image Ready)¶
Connect to output node (Gmail in this case).
False Branch (Still Processing)¶
- Add Wait node (15 seconds)
- Rename: "Wait another 15s"
- Connect back to WaveSpeed GET node
This creates a loop that keeps checking until the image is ready.
Step 7: Gmail Output¶
Configuration¶
- Add Gmail node to True branch
- Connect Gmail account
- Settings:
- To: Your email address
- Subject: "Image Generated" + timestamp variable
- Email Type: Text
- Message: Drag
output(image URL) from If node
Remove Attribution¶
Under Options: 1. Add "Append n8n Attribution" 2. Toggle OFF
Test¶
- Execute the Gmail step
- Check email inbox
- Click the link to view generated image
Complete Workflow Flow¶
Chat Trigger (user input)
↓
Image Prompt Agent (enhance prompt)
↓
WaveSpeed POST (request generation)
↓
Wait 15 seconds
↓
WaveSpeed GET (check result)
↓
If (status == completed)
├── True → Gmail (send image link)
└── False → Wait 15s → Loop to GET
Production Considerations¶
- Pin POST data: Avoid regenerating during testing
- Adjust wait times: Based on model speed
- Error handling: Add additional status checks
- Cost awareness: Each generation consumes API credits
- Output flexibility: Replace Gmail with Telegram, Slack, etc.
Key Takeaways¶
-
AI prompt engineering improves results: Transform simple descriptions into detailed, effective prompts.
-
System prompts are crucial: Use LLMs to generate specialized prompt engineering instructions.
-
WaveSpeed provides unified API: One platform for multiple image generation models.
-
Async pattern applies here: POST to start, GET to retrieve, loop until complete.
-
Pin data saves money: Image generation has real costs—pin during development.
-
If loops prevent failures: Never assume the image is ready—always check status.
-
Wait nodes are essential: Give external APIs time to process.
-
Output nodes are interchangeable: Gmail, Slack, Telegram, or any other channel works.
-
Dynamic variables in requests: Use expressions to pass prompt content and request IDs.
-
Loop back for retries: Connect false branch to repeat the check until successful.
Conclusion¶
The text-to-image workflow demonstrates the power of combining AI prompt engineering with image generation APIs. The pattern established here—prompt enhancement, async API calls, status polling, and conditional delivery—applies to many creative automation scenarios. The AI prompt agent transforms casual descriptions into optimized prompts that significantly improve image quality. This same architecture extends to video generation (covered next), with longer wait times and additional parameters. Understanding this workflow prepares learners for increasingly complex multimodal automations involving text, images, and video generation.