Running Multiple Agents in Antigravity

Founder of Picasso IA

June 3, 2026 - 1:40 AM

Running multiple agents in Antigravity sounds simple until the third agent silently crashes and you spend forty minutes wondering why the output is half-baked. Parallel execution is one of the most powerful features in Antigravity, but it requires a specific mental model. This article covers what actually works in production, how real teams structure their agent pipelines, and where to avoid the traps that look harmless at first glance.

What Antigravity Does Differently

Multiple people with laptops at a circular table

Most agent frameworks serialize by default. You define a task, an agent picks it up, finishes it, and only then does the next one start. Antigravity inverts this. Its core scheduler is built around a concurrent event loop that treats agent tasks as first-class citizens, not afterthoughts bolted on with a thread pool.

The Loop That Runs Everything

At its core, Antigravity runs a reactive loop. When you register an agent, you are not spinning up a new process. You are registering a coroutine that the scheduler manages. This means latency compounds differently than in threaded systems. Ten agents running concurrently in Antigravity will typically outperform ten sequential calls because the I/O wait time, which is where most LLM calls spend 80% of their time, is shared across the pool.

The core insight here: concurrent does not mean uncontrolled. Antigravity gives you the building blocks, but the coordination logic is entirely yours.

Why Single-Agent Limits Matter

Before adding more agents, it helps to know why the single-agent default exists. A single agent is predictable. It reads a context, acts on it, and returns a result. The moment you introduce a second agent reading the same context, you introduce the possibility of divergent outputs. Antigravity does not magically reconcile those. That is your job.

💡 Start with one agent, profile where it spends time waiting, and only then decide which waits are worth parallelizing.

Setting Up Multiple Agents

Close-up of hands typing code at a terminal

Setting up multiple agents in Antigravity requires three decisions upfront: how agents spawn, whether they share state, and how they pass results back to the orchestrator.

Spawning Agents in Parallel

The most direct approach is explicit spawning at the task definition level. Instead of calling agents sequentially, you define a batch of tasks and let the scheduler dispatch them simultaneously.

tasks = [
    agent.create_task("summarize", doc_a),
    agent.create_task("summarize", doc_b),
    agent.create_task("summarize", doc_c),
]
results = await asyncio.gather(*tasks)

The critical detail: asyncio.gather in Antigravity respects the agent pool size. If you set max_concurrent=3 and dispatch 10 tasks, the first three fire immediately. The remaining seven queue. This is intentional rate control, not a bug.

Shared State vs. Isolated Tasks

This is the decision that breaks most multi-agent setups. There are three valid patterns:

Pattern	When to Use	Risk
Isolated	Each agent needs different data	Low: no conflicts
Shared Read	Agents need the same base context	Medium: memory bloat
Shared Write	Agents update a common object	High: race conditions

Isolated tasks are almost always the right default. If two agents need the same piece of data, pass a copy to each. The memory cost is worth the predictability.

Shared write state should be treated as a last resort, not a convenience. If you absolutely need it, use Antigravity's built-in StateManager with explicit locking rather than a plain Python dict.

Passing Context Between Agents

When Agent A's output feeds Agent B, you have a dependency. In Antigravity, dependencies break parallelism by definition. Two agents that depend on each other cannot run at the same time.

The cleanest pattern is explicit handoff:

summary = await agent_a.run(document)
analysis = await agent_b.run(summary)

For large pipelines with mixed dependencies, use a dependency graph approach. Define which tasks are independent (run in parallel) and which are sequential (run in order), and let Antigravity's scheduler handle the rest.

Patterns That Work at Scale

Woman at whiteboard drawing workflow diagrams

Three patterns handle roughly 90% of multi-agent use cases in Antigravity. They are not exotic. They are boring in the best possible way.

The Fan-Out Pattern

The fan-out pattern is the most common pattern for batch AI work. One input, many parallel agents, one collection point.

How it works:

The orchestrator receives a batch of items, such as 20 documents
The orchestrator spawns one agent per item (or per chunk)
All agents run concurrently
The orchestrator collects all results once they finish

This pattern shines when tasks are embarrassingly parallel: no agent needs to know what any other agent is doing. Image generation, document summarization, classification tasks, and translation all fit this shape perfectly.

💡 For very large batches, add a semaphore to control max concurrency: asyncio.Semaphore(10) ensures you never spawn more than 10 agents at once, protecting downstream API rate limits.

The Pipeline Chain

The pipeline chain is the fan-out's counterpart: a strictly sequential flow where each agent builds on the previous one's output.

Agent 1 (Research) → Agent 2 (Draft) → Agent 3 (Edit) → Agent 4 (Format)

Best for: tasks where quality depends on progressive refinement. Writing, code generation, and multi-step reasoning benefit from pipeline chains because later agents can correct earlier agents' mistakes.

The risk with pipeline chains is error propagation. If Agent 1 returns a flawed output, Agents 2 through 4 will confidently build on that flaw. Add validation between stages, even if it is just a simple length or schema check.

The Supervisor Model

Server room corridor with organized cables

The supervisor model is the most sophisticated of the three. One agent, the supervisor, orchestrates a pool of worker agents. The supervisor does not do the actual work. It plans, delegates, reviews, and decides whether to retry.

The supervisor's responsibilities:

Decompose the original task into subtasks
Assign subtasks to the appropriate worker agents
Validate each result before passing it downstream
Handle failures by retrying, reassigning, or escalating

For this pattern, Kimi K2.6 and GPT 5.1 are excellent supervisor models. Both are purpose-built for agent orchestration tasks, with GPT 5.1 specifically designed for building AI agents. As workers, lighter models like Claude 4.5 Haiku or GPT 4.1 Mini cut costs significantly without sacrificing quality on well-scoped tasks.

Role	Recommended Model	Reason
Supervisor	Kimi K2.6	Strong reasoning, built for agents
Supervisor	GPT 5.1	Native agent building capabilities
Worker	Claude 4.5 Haiku	Fast, cost-efficient
Worker	GPT 4.1 Mini	Low latency, reliable output
Reasoner	Deepseek R1	Deep reasoning for complex sub-tasks

Where Things Break

Man with four monitors showing AI outputs

Most multi-agent failures in Antigravity fall into two categories. Both are avoidable once you know what to watch for.

Race Conditions in Shared Resources

A race condition occurs when two agents write to the same resource at the same time and neither knows the other exists. In Antigravity, this typically surfaces as:

Two agents updating the same file: the second write overwrites the first silently
Two agents calling the same API endpoint: rate limits trip without warning
Two agents updating a shared dict: one update disappears

The fix is to never write to a shared resource without a lock. In practice, this means:

lock = asyncio.Lock()

async def safe_write(lock, data, destination):
    async with lock:
        destination.append(data)

For external resources like files or APIs, serialize writes through a single dedicated writer agent. Other agents pass their outputs to the writer; the writer is the only one that touches the resource.

Token Budget Collisions

This one is less obvious. When you run multiple agents simultaneously, each agent requests tokens from the same model endpoint. If you are not managing your total concurrent token spend, you will hit rate limits at unpredictable intervals.

The pattern here is token budgeting at the orchestrator level. Before spawning agents, estimate total token requirements for the batch. If the estimate exceeds your rate limit window, introduce a delay or reduce batch size.

💡 For heavy parallel workloads, Llama 4 Maverick Instruct and Deepseek v3.1 offer high-throughput options with generous rate limits, making them practical choices for volume-heavy agent pipelines.

Common symptoms of token budget collisions:

Agents finish, but some results are truncated
Intermittent 429 errors with no clear pattern
Total output quality drops as batch size increases
Some agents return empty or partial responses

If you see any of these, check your concurrent token consumption before assuming a bug in your agent logic.

Pairing Agents with AI Image Generation

Two professionals collaborating at conference table

Multi-agent setups become especially interesting when you mix text and image generation tasks. A common real-world use case: generate a batch of articles, then automatically produce images for each article in parallel.

One Agent Per Media Type

The cleanest approach assigns separate agents to separate media types. One agent handles all text generation. A separate agent pool handles all image generation requests. The two pools run concurrently but never interact directly.

Typical structure:

Text agent pool processes articles in parallel
Each finished article is passed to the image queue
Image agents pick up tasks from the queue and generate artwork
A collector agent aggregates text and image pairs for final output

Connector macro close-up representing integration

This separation matters for a practical reason: text generation and image generation have very different latency profiles. Text for a 1000-word article might take 8 seconds. Image generation might take 15 to 25 seconds. If you mix them in the same agent pool, fast text tasks will queue behind slow image tasks. Keeping them separate maximizes throughput for both.

Using LLMs as Coordinators on PicassoIA

For workflows that involve both writing and visual production, using an LLM as the coordinator and dedicated image models as workers is a highly effective architecture. The LLM handles task decomposition, prompt refinement, and quality review. The image models handle the actual generation.

On PicassoIA, this maps to a natural split:

Coordinator: Claude Opus 4.7 or GPT 5 for orchestration, planning, and review
Image workers: text-to-image models for parallel image generation
Quality check: Kimi K2 Instruct or Gemini 3 Pro for validating outputs against defined criteria

💡 When building multi-modal agent pipelines, treat prompt engineering as a first-class task. Assign one LLM agent solely to refining image prompts from raw article content before passing them to image models. Output quality improves substantially with this step.

Woman on sofa working with laptop in morning light

Agent State Management Done Right

State is the silent killer of multi-agent systems. Agents that carry too much state become unpredictable. Agents that carry no state become useless. The sweet spot is stateless agents with explicit context injection.

What this means in practice:

Each agent receives everything it needs in a single input object
Agents do not maintain internal memory across calls
All state lives in the orchestrator, not in individual agents

This pattern, sometimes called the message-passing style, makes agents dramatically easier to test, debug, and scale. You can replace any agent without updating others, because no agent holds state the others depend on.

Anti-patterns to avoid:

Anti-Pattern	What Goes Wrong
Global shared dict	Write conflicts, silent data loss
Agent "memory" between runs	State drift, unpredictable outputs
Full conversation history to every agent	Token bloat, slower responses
Hardcoded model names inside agents	Inflexible, hard to swap models

The moment you find yourself debugging why an agent's output changed without changing its code, state drift is almost always the culprit.

Retry Logic and Fault Tolerance

Overhead creative desk with printed AI outputs being arranged

Production multi-agent systems fail. Network calls time out, model APIs return errors, and occasionally an agent produces output that fails your validation rules. Building retry logic into the orchestrator from day one, rather than adding it later, is the difference between a reliable pipeline and a brittle one.

A practical retry strategy:

Classify failures: transient (retry immediately), rate-limit (wait and retry), or fatal (escalate to human)
Set max retries per task: 3 is a sensible default for most workloads
Implement exponential backoff: wait 1s, then 2s, then 4s between retries
Log every failure with context: include task ID, agent ID, error type, and input hash

For reasoning-heavy tasks where an agent produces a logically wrong answer rather than a technical error, Deepseek R1 is worth using as a fallback validator. Its step-by-step reasoning makes it well-suited for catching logical errors that other models miss.

Retry vs. Fallback:

Not every failure warrants a retry with the same model. Consider a fallback pool where failed tasks are reassigned to a different model. A task that times out on GPT 5 Pro might finish successfully on Claude 4.5 Sonnet, especially if the timeout was caused by reasoning depth rather than connectivity.

Build Your First Multi-Agent Workflow

Running multiple agents in Antigravity is not about adding more models to a script. It is about thinking in pipelines where each stage has a clear input, a clear output, and a clear failure mode.

The teams getting the most out of multi-agent Antigravity setups follow a simple progression:

Get one agent working perfectly on a single task
Identify the bottleneck (almost always I/O wait)
Parallelize exactly the tasks that are waiting
Add a supervisor if coordination complexity grows
Monitor, retry, and validate at every stage

PicassoIA's collection of large language models gives you the full range of models needed for every role in this architecture: fast workers, capable supervisors, and deep reasoners. Whether you are building a content production pipeline, an automated research tool, or a multi-modal creative workflow, the building blocks are all available.

Try composing your first fan-out pipeline on PicassoIA today. Pick a batch task you currently handle manually, assign it to three parallel agents using Kimi K2.6 or GPT 5.1, and time the difference. The first result usually changes how you think about automation permanently.

Share this article