PPLayouts

Reference

Static reference docs — architecture, strategy specs, ops runbook, metric definitions, glossary

run: …

Table of Contents

Architecture

The factory is a framework for rapidly spinning up, paper-trading, and evaluating Polymarket strategies. It now has explicit runtime environments for research, paper, and live execution.

You (idea) → factory/strategies/my_strategy.py → STRATEGIES registry ↓ runner.py (paper / live / research env) ├── fetch 100 top markets (Gamma API) ├── each strategy: scan → signal → env policy decides action ├── paper/live env: open + resolve positions in scoped broker ├── research env: log signals only, never open positions └── WhatsApp summary → Polymarket Signals group

Stack

  • Python 3.11+ · uv
  • Gamma API — top markets feed
  • DDGS — news search
  • Claude API — LLM reasoning passes
  • OpenClaw — WhatsApp messaging
  • SQLite — trade state, runs, signals, decisions

Key files

PathPurpose
factory/runner.pyMain run loop — orchestrates all strategies per cycle
factory/environment.pyRuntime environment policy: research / paper / live gating
factory/strategies/*.pyIndividual strategy implementations
factory/broker.pyTrade opening/closing, portfolio state
factory/live_broker.pyReal-money execution path scoped to live trades only
factory/db.pySQLite schema, queries, migrations
factory/feed.pyGamma API market fetching + formatting
factory/claude.pyClaude API wrapper for strategy reasoning
factory/notify.pyWhatsApp/alert dispatch
factory/models.pySignal, Trade, Run dataclasses
eval/report.pyWeekly evaluation report generator
scripts/export_dashboard_data.pyExports JSON snapshot for this dashboard
scripts/build_replay_benchmark.pyBuilds strategy-level replay benchmark summaries from logged signals, execution checks, and resolved outcomes
scripts/update_wiki.pyGenerates wiki/*.md from DB via Claude (Karpathy pattern)
data/factory.sqlite3Live database (gitignored)

Runtime environments

EnvironmentBehavior
researchScans and logs signals only. Never opens or resolves positions.
paperPaper-only trading path. Opens and resolves only paper trades.
liveReal-money path. Only explicit mode="live" plus live_ready=True strategies can execute.

WhatsApp messaging policy

  • 09:00 Europe/Madrid — full general summary
  • other scheduled runs — opened/closed delta update + alert snippet if relevant

Dashboard pipeline

uv run python scripts/update_wiki.py # regenerate wiki/*.md from DB uv run python scripts/export_dashboard_data.py # write JSON to dashboard-data/ uv run python scripts/build_dashboard.py # bundle into dashboard-dist/ uv run python scripts/publish_dashboard.py ~/path/to/repo --commit --push
The dashboard is snapshot-based, not live. Data freshness is shown in the top-right snapshot-age pill.

Strategies

Strategy interface

class Strategy(ABC): name: str mode: str # preferred execution env: "paper" | "live" max_position_usdc: float min_ev_pp: float alert_only: bool # if True, log/report only; runner will not open positions trading_enabled: bool # explicit runner gate; keep False for alert-only promotable: bool # candidate for later graduation live_ready: bool # prerequisite for the live environment, not the whole policy # portfolio taxonomy edge_type: str time_window: str # super_short | intraday | short | medium | long target_hold_min_days: float target_hold_max_days: float scan_frequency: str def scan(self, markets) -> list[Signal] def size(self, signal) -> float def should_exit(self, trade, price) -> bool

Active strategies (paper trading)

ev_news

active information medium 3×/day

Thesis: Recent news contains information not yet priced into prediction markets.

Method: Claude scans top markets + news headlines, picks topics with likely EV, then estimates p̂ per market from news snippets.

max_position_usdc$15
min_ev_pp10 pp
hold window7–30 days
n_topics per run3
min_volume$10,000
days_to_close7–60 days
max_trades_per_run3

spread_arb

active structural medium daily

Thesis: In multi-outcome markets the sum of all YES prices should be ~1.0. When the sum is significantly below 1.0, buying all outcomes locks in basket EV.

Method: Scan multi-outcome events, compute basket sum, filter for clean legs, score by gap and volume.

max_position_usdc$8 per leg
arb_threshold≤ 0.90
min_outcomes3
min_volume$15,000
days_to_close7–30 days
max_new_baskets_per_run3

stale_market

active information short 3×/day

Thesis: Liquid markets sometimes lag relevant news and do not reprice quickly enough.

Method: Filters liquid near-term markets, fetches recent news, uses Claude to judge whether the market appears stale. Dedupes by topic cluster.

max_position_usdc$10
days_to_close3–45 days
price range0.10–0.85
min_volume$8,000
max_trades_per_run2

correlated_pairs

active logical_inconsistency medium daily

Thesis: Some market pairs violate basic logical consistency (prerequisite vs downstream, broader vs narrower).

Method: Heuristic pair discovery by keyword clustering, then a Claude pass to classify the relationship and identify the cheaper implication.

max_position_usdc$10
min_ev_pp10 pp
relationship_gap_pp10 pp
days_to_close≤ 120 days
max_trades_per_run2
hold window3–30 days

celebrity_tabloid

active information short

Thesis: Gossip/tabloid coverage directionally corroborates celebrity event markets before the crowd reprices.

Method: Screener for celebrity event markets (pregnancy, romance, scandal). Fails closed unless tabloid coverage corroborates the market side.

Active blocker: top-100 Gamma feed does not typically surface celebrity/tabloid markets. 0 eligible candidates seen to date.

Live-only strategies

carry_rewards

live-only structural long daily

Thesis: Binary Polymarket markets can offer holding-yield carry by buying a market-neutral full set (YES + NO) and collecting rewards.

Method: Scan binary markets with enough duration and liquidity, rank by carry yield, and only execute in the live environment.

mode="live" · live_ready=True · blocked from paper by environment policy

Alert-only strategies

correlated_laggard

alert-only promotable logical_inconsistency

Thesis: Liquid leader / laggard divergences across obviously related markets create short-lived arbitrage windows.

Method: Finds markets correlated by keyword, compares prices of leader vs laggard, alerts when divergence exceeds threshold.

trading_enabled=False · promotable=True · live_ready=False

esport48

alert-only promotable super_short

Thesis: Esport markets expiring within 48 hours with strong liquidity/price signals can be identified with deterministic filters.

Method: Screener using deterministic liquidity/price filters + subtype tagging. No LLM pass — pure heuristic.

trading_enabled=False · promotable=True · live_ready=False

Paused / killed strategies

resolution_hunter

killed resolution_lag short

Thesis: Markets sometimes stay open after real-world resolution, creating free EV.

Kill reason: -92.3% ROI on 12 closed trades. Conclusive failure — paused=True, trading_enabled=False, exposure cap set to 0.

fade_certainty

killed mean_reversion

Thesis: Markets at extreme prices (>93% or <7%) are systematically overconfident and can be faded.

Kill reason: 0% win rate, -100% ROI on 6 closed trades. Too blunt — no category filtering, no news validation, static fade amounts.

Revival only as narrow subtype: political outrights only, no sports/novelty, with stale-price evidence required.

weather_edge

paused quantitative

Thesis: Open-Meteo ensemble probabilities can beat Polymarket crowd pricing on daily temperature bucket markets.

Pause reason: 45% WR, -19.5% ROI on 82 closed trades. Too many correlated bets per city/day, EV threshold too low for noisy bucket outcomes.

v2 salvage path: trade only 1–2 strongest buckets per city, raise EV threshold, avoid same-day markets.

Planned strategies

  • polling_vs_market — compare polling data to Polymarket political prices
  • base_rate — use historical base rates to identify mispriced recurring events
  • crypto_options_basis — cross-venue crypto options implied vol vs Polymarket
  • pre_event_drift — detect systematic price drift in the hours before scheduled events

Adding a new strategy

1. Create factory/strategies/my_strategy.py implementing Strategy base class 2. Register in factory/strategies/__init__.py STRATEGIES dict 3. Set trading_enabled=False and alert_only=True for the first paper-eval period 4. Run: uv run python -c "from factory.runner import run; run(dry_run=True)" 5. Inspect with: uv run python scripts/strategy_checks.py my_strategy --limit 10

Time Window Taxonomy

Time windows drive operational scheduling — faster buckets run every cycle, slower ones skip midday churn.

LabelDurationRunner cadenceCurrent strategies
super_short< 1 hourEvery cycleesport48
intraday1h – 24hEvery cycle
short1–7 daysEvery cyclestale_market, celebrity_tabloid
medium8–30 daysCan skip middayev_news, spread_arb, correlated_pairs, correlated_laggard
long31+ daysOnce/day

Open exposure is capped by both strategy-level limits and time-window-level portfolio limits.

Evaluation

Kill / keep thresholds

MetricKill thresholdKeep threshold
Win rate< 30%> 50%
ROI< -10%> 0%
Min trades to evaluate5 closed trades minimum

Evaluation dimensions

  • By strategy
  • By time window
  • By edge type
  • Active vs legacy

Running the weekly eval

uv run eval/report.py

Edge types

Edge typeDescription
informationFaster / better news processing than the crowd (ev_news, stale_market, celebrity_tabloid)
structuralMathematical inconsistency baked into market structure (spread_arb)
resolution_lagMarkets staying open after real-world outcome (resolution_hunter — killed)
logical_inconsistencyCross-market logical violations (correlated_pairs, correlated_laggard)
quantitativeExternal data model beats crowd calibration (weather_edge — paused)
mean_reversionExtreme-price markets revert (fade_certainty — killed)

Replay Benchmark

The replay benchmark is a strategy-level score built from persisted signals, signal_execution_checks, and resolved-trade labels where available. It is intended as a keep/discard gate for alert-only and generated strategies, not as a replacement for realized P&L review.

Current dashboard scope

  • The Overview page renders the alert-only replay benchmark.
  • Generated-strategy benchmark rows only appear once generated strategies have actual signal evidence.
  • Benchmark rows are currently aggregated per strategy, not per market or per signal family.

Inputs

  • Directional evidence — resolved outcomes where the signal can be labeled as correct/incorrect
  • Execution realism — EV after slippage at $10 / $25 from Phase A fill proxies
  • Capacity — max size with positive EV and source-confidence quality
  • Uniqueness — overlap penalty for duplicate same-market / same-side signals
  • Coverage — sample-size floor so tiny lucky runs do not dominate

Score shape

benchmark_score = 0.45 × directional_score + 0.25 × slippage_score + 0.15 × capacity_score + 0.10 × uniqueness_score + 0.05 × coverage_score

How to build it

uv run python scripts/build_replay_benchmark.py --scope alert-only uv run python scripts/build_replay_benchmark.py --scope generated uv run python scripts/export_dashboard_data.py uv run python scripts/build_dashboard.py
The replay benchmark is a research/control metric. It should not automatically change strategy code, scheduling, or live eligibility without a separate review gate.

Alert-Only Graduation Checklist

Promote an alert-only strategy to paper trading only after all of the following are true:

  1. At least 10 live runs reviewed with persisted detail-table evidence.
  2. At least 15 alerts or 30 candidate checks inspected — not anecdotal.
  3. Top alerts look directionally sensible on manual replay, no duplicate/cluster spam.
  4. Liquidity and fillability plausible for intended size — not relying on dead books.
  5. Logged reasons explain why alert fired and why weaker candidates were skipped.
  6. Initial paper cap small enough to fail safely on first promotion.

Promotion workflow

  1. Keep trading_enabled = False while paper-eval checklist is open.
  2. Mark strategy record with clear keep/promote decision.
  3. Flip trading_enabled = True only after checklist complete.
  4. Leave live_ready = False until a separate live-broker checklist exists.

Current graduation status

StrategyStatusPromotableBlocker
correlated_laggardalert-onlyYesPaper-eval checklist open — see EX-20260401-006
esport48alert-onlyYesPaper-eval checklist open — see EX-20260401-007
celebrity_tabloidpaper tradingFeed coverage — top-100 Gamma rarely surfaces celebrity markets

Phase A Execution Checks

Phase A metrics are fill proxies, not actual fills. They are a market-microstructure snapshot taken at signal time, not a live trade confirmation.

What Phase A captures

At the moment a signal fires, the runner records:

  • The quote price (current market YES price)
  • Best bid / best ask from the CLOB
  • Estimated fill price at $10 and $50 notional
  • EV after slippage at $10 and $50
  • Max size with positive EV
  • Source confidence label (direct quote vs heuristic fallback)

How to interpret

  • EV @ $10 / $50 pp — rough comparative metric. Do not treat as realized return.
  • Max +EV size — proxy capacity summary, not a live guarantee.
  • Source confidence — how much of the phase A layer is grounded in direct quote fields vs fallback heuristics.

Inspecting execution checks

uv run python scripts/signal_execution_checks.py --limit 20 uv run python scripts/signal_execution_checks.py --strategy spread_arb --limit 20

Metric Definitions

Status vocabularies

Run status:

ValueMeaning
okRun completed successfully without fatal errors
warningRun completed but warnings/errors exceeded threshold
errorRun failed or ended in a clearly broken state
unknownStatus cannot be determined from stored data

Strategy status:

ValueMeaning
activeCurrently part of the active strategy stack
pausedIntentionally disabled but still in current-era reporting context
legacyHistorical strategy, no longer part of current active stack
unknownCannot classify with confidence

Experiment status:

ValueMeaning
activeCurrently in progress
plannedDefined but not yet active
review_dueHas reached or passed a stated review point
completedReached a documented conclusion
archivedRetained for history, not current focus

Exposure metrics

FieldDefinition
open_exposure_activeTotal open exposure from active strategies (absolute, not signed)
open_exposure_legacyTotal open exposure from legacy/paused strategies
open_position_count_activeCount of open positions from active strategies
open_position_count_legacyCount of open positions from legacy strategies

PnL metrics

FieldDefinition
realized_pnl_30dRealized P&L from closed positions in the last 30 days
realized_pnl_all_timeRealized P&L across all available history
Realized and unrealized P&L are never combined into one headline metric.

Phase A execution fields

FieldDefinition
execution_checks_30dCount of signal execution checks in the last 30 days
strategies_with_execution_checks_30dDistinct strategies with at least 1 check in the last 30 days
avg_ev_after_slippage_50_pp_30dAverage EV after $50 slippage across checks (30d)
avg_max_size_positive_ev_30dAverage max +EV size (USD) across checks (30d)
benchmark_top_strategy_alert_onlyBest current alert-only strategy by replay benchmark score
benchmark_top_score_alert_onlyReplay benchmark score of the top alert-only strategy
benchmark_signal_count_alert_onlyTotal signals included in the current alert-only replay benchmark snapshot

Null / unknown policy

  • Use null for absent scalar values
  • Use unknown for enum-like status fields
  • Use [] for genuinely empty collections
  • Never substitute 0 for unknown, or empty string for unknown status

Operations Runbook

Common commands

# Manual full run uv run python -m factory.runner # Research-only run FACTORY_ENV=research uv run python -m factory.runner # Live run FACTORY_ENV=live uv run python -m factory.runner # Safe dry run (no writes, no closes, no sends) uv run python -c "from factory.runner import run; run(environment='paper', dry_run=True)" # Fast safe dry run (skips ev_news, trims workloads) uv run python -c "from factory.runner import run; run(environment='paper', dry_run=True, fast_dry_run=True)" # Safe dry run of live policy uv run python -c "from factory.runner import run; run(environment='live', dry_run=True, send=False, fast_dry_run=True)" # Open book uv run python scripts/open_positions.py uv run python scripts/open_positions.py --top-oldest 5 uv run python scripts/open_positions.py --strategy ev_news uv run python scripts/open_positions.py --time-window medium # Latest run summary uv run python scripts/latest_run.py -n 1 # Inspect decisions uv run python scripts/inspect_decisions.py --limit 20 # Strategy-specific checks uv run python scripts/strategy_checks.py stale_market --limit 10 uv run python scripts/strategy_checks.py correlated_laggard --limit 10 uv run python scripts/strategy_checks.py esport48 --limit 10 # Run analytics uv run python scripts/run_analytics.py --runs 20 # Active experiments uv run python scripts/active_experiments.py # Weekly evaluation uv run eval/report.py # Replay benchmark uv run python scripts/build_replay_benchmark.py --scope alert-only uv run python scripts/build_replay_benchmark.py --scope generated # Regenerate wiki from DB uv run python scripts/update_wiki.py # Local DB backup (keep 14 days) uv run python scripts/backup_db.py --keep 14

Launchd schedule

JobSchedule
com.polymarket.factoryEvery 2 hours at :00 (paper environment)
com.polymarket.factory.live19:30 daily (live environment)
com.polymarket.factory.aggressive10:30 / 22:30 daily
com.polymarket.factory.backup03:45 daily

SQLite database

Live at data/factory.sqlite3 (gitignored). Tables:

  • runs — one row per runner execution
  • signals — strategy signals generated each run
  • decisions — open/close/skip decisions per signal
  • signal_execution_checks — Phase A fill proxies at signal time
  • run_logs — log entries per run
  • trades includes a mode column so paper and live positions are tracked separately
  • Trade state is SQLite-backed; data/trades.csv is exported during migration period

Dashboard publishing

uv run python scripts/update_wiki.py uv run python scripts/export_dashboard_data.py uv run python scripts/build_dashboard.py uv run python scripts/publish_dashboard.py ~/path/to/dashboard-repo --commit --push

Key resources

  • Gamma API: https://gamma-api.polymarket.com/markets
  • CLOB API: https://clob.polymarket.com

Glossary

TermDefinition
Estimated true probability for a market outcome, derived by a strategy's reasoning pass
EVExpected value — the edge a trade offers relative to the current market price, in percentage points
EV ppEV expressed in percentage points (e.g. 12 pp EV = 12% expected edge)
Phase ASignal-time execution check — a fill-proxy snapshot of market microstructure when a signal fires
fill proxyAn estimate of what fill price would have been, based on CLOB bid/ask data. Not an actual fill.
basket EVIn spread_arb: the guaranteed return from buying all legs in a multi-outcome market when sum < 1.0
alert-onlyStrategy mode where signals are logged and reported but no positions are opened
paper tradingStrategy mode where positions are opened in the simulator but no real money is deployed
researchRunner environment that scans and logs signals only, with no position opening or resolution
live_readyStrategy prerequisite for the live environment; not sufficient on its own without environment-policy approval
promotableFlag indicating a strategy is a valid candidate for promotion once evidence is sufficient
source_confidenceLabel indicating whether Phase A data came from a direct CLOB quote or a heuristic fallback
replay benchmarkStrategy-level composite score built from directional labels, execution realism, capacity, uniqueness, and coverage
Gamma APIPolymarket's market data API — provides the top-100 active markets feed used by all strategies
CLOBCentral Limit Order Book — Polymarket's on-chain order book, used for execution checks
DDGSDuckDuckGo Search — used by news-based strategies to fetch recent headlines
OpenClawInternal tool for WhatsApp messaging via the runner summary dispatch
launchdmacOS daemon scheduler — runs the paper, live, aggressive, and backup jobs on their configured calendars
Karpathy patternAuto-generating living documentation from DB data via LLM — used by update_wiki.py