HyperFlowHyperFlow
Home
Guide
Examples
GitHub
Home
Guide
Examples
GitHub
  • Guide

    • Installation
    • Basic concepts
    • Advanced concepts and architecture
    • Limitations
    • Examples
    • Contributing
    • Citation

Basic concepts

HyperFlow is a self-improving agent framework. Instead of manually tuning an AI agent, you let another AI agent do it automatically.

The core idea comes from evolutionary computation and Quality-Diversity style archives: keep many agent versions, score them, and use strong ancestors as parents for the next mutation (the MetaAgent edits code). Workflow diagrams below match this narrative so you can read in one place.

Overview

Workflow diagrams

Evolutionary loop (outer)

One generation (sequence)

Use participant id Main (not Loop) — Mermaid reserves loop for control blocks.

TaskAgent vs MetaAgent (programs)

Execution mode

The two agents

TaskAgent — the worker

The TaskAgent solves domain-specific tasks. It receives a formatted prompt, optionally uses tools, and returns a prediction.

  • Input: A task description (formatted by the domain harness).
  • Output: A prediction.
  • Tools: Domain-specific, optional.

MetaAgent — the improver

The MetaAgent is the mutation operator. HyperFlow treats an agent as a computable program, so the MetaAgent can refine logic, prompts, tools, and strategies on disk (metacognitive self-modification).

  • Input: Repo path, evaluation results, parent score context.
  • Output: Patches / modified source files.
  • Tools: Built-in bash and editor.

How they cooperate

The evolutionary loop

The loop (see src/Core/GenerateLoop.php) runs generations until max_generations or early stop. Each generation typically:

  1. Select parent from the archive.
  2. Set up executor (currently LocalExecutor in PHP).
  3. Run MetaAgent — produce a new modification from failures and context.
  4. Run TaskAgent through the harness.
  5. Evaluate — domain scores predictions; reports under the output directory.
  6. Update archive — append a JSONL snapshot with scores and logic changes.

The archive

The archive is an append-only JSONL file: each line is a full snapshot. Read the last line for current state. Lineage is a tree: parent_id points to the real ancestor, not necessarily the latest id.

Why JSONL?

JSONJSONL
StructureOne object per fileOne object per line
AppendRewrite fileAppend line
Latest stateParse allRead last line
Typical use herereport.json, predictions.jsonarchive.jsonl

Parent selection strategies

Chosen once in config for the whole run (select_parent.py):

StrategyBehavior
randomUniform over valid parents — max exploration
latestMost recent valid parent — simple chain
bestHighest score — pure exploitation
score_propRandom weighted by score
score_child_propScore-weighted with child penalty (default)

Why not always best? You can get stuck in a local maximum. Child penalty uses: weight = (score + 0.01) × 1 / (1 + num_children).

Domains and evaluation

A Domain defines your benchmark: load tasks, format input, evaluate predictions, and report aggregates. Evaluators in evaluators.py include static_evaluator, llm_judge_evaluator, and human_feedback_evaluator. The harness (harness.py) runs the TaskAgent over tasks.

The harness

Predictions vs scores

ScorePrediction
WhatNumber from 0 to 1Model output string
Typical filesreport.jsonpredictions.json
Used forParent selection, rankingUser-facing output, debugging

Executors

executor.py provides LocalExecutor (fast, dev) and DockerExecutor (sandboxed via docker.py).

Self-referential improvement (prompts_dir)

Editable prompt files let the MetaAgent change its own instructions over generations:

  • meta_agent.txt
  • task_agent.txt

Early termination

  • Best archive score 1.0 stops the loop.
  • The MetaAgent receives score context so it avoids needless edits when already passing.

Examples overview

ExampleFocus
BashCommand generation
CalculatorTool code fixes
Fact-checkClassification

See Examples for commands.

Glossary

TermDefinition
ArchiveJSONL history of generations and scores
DomainTask suite + evaluation
Evaluatorstatic / LLM judge / human
ExecutorLocal or Docker workspace per generation
HarnessRuns TaskAgent over domain tasks
MetaAgentEdits code to improve TaskAgent
ParentArchive node used as base for a child
PatchDiff from MetaAgent

Next steps

  • Advanced concepts
  • Limitations
Edit this page
Last Updated: 4/29/26, 5:05 AM
Contributors: Muhammad Umer Farooq
Prev
Installation
Next
Advanced concepts and architecture