PPLayouts Dashboard · Reference

Architecture Strategies Time Windows Evaluation Replay Benchmark Alert-Only Graduation Phase A Execution Metric Definitions Operations Glossary

Architecture

The factory is a framework for rapidly spinning up, paper-trading, and evaluating Polymarket strategies. It now has explicit runtime environments for research, paper, and live execution.

You (idea) → factory/strategies/my_strategy.py → STRATEGIES registry
                                                                  ↓
                                                    runner.py (paper / live / research env)
                                                    ├── fetch 100 top markets (Gamma API)
                                                    ├── each strategy: scan → signal → env policy decides action
                                                    ├── paper/live env: open + resolve positions in scoped broker
                                                    ├── research env: log signals only, never open positions
                                                    └── WhatsApp summary → Polymarket Signals group

Stack

Python 3.11+ · uv
Gamma API — top markets feed
DDGS — news search
Claude API — LLM reasoning passes
OpenClaw — WhatsApp messaging
SQLite — trade state, runs, signals, decisions

Key files

Path	Purpose
factory/runner.py	Main run loop — orchestrates all strategies per cycle
factory/environment.py	Runtime environment policy: research / paper / live gating
factory/strategies/*.py	Individual strategy implementations
factory/broker.py	Trade opening/closing, portfolio state
factory/live_broker.py	Real-money execution path scoped to live trades only
factory/db.py	SQLite schema, queries, migrations
factory/feed.py	Gamma API market fetching + formatting
factory/claude.py	Claude API wrapper for strategy reasoning
factory/notify.py	WhatsApp/alert dispatch
factory/models.py	Signal, Trade, Run dataclasses
eval/report.py	Weekly evaluation report generator
scripts/export_dashboard_data.py	Exports JSON snapshot for this dashboard
scripts/build_replay_benchmark.py	Builds strategy-level replay benchmark summaries from logged signals, execution checks, and resolved outcomes
scripts/update_wiki.py	Generates wiki/*.md from DB via Claude (Karpathy pattern)
data/factory.sqlite3	Live database (gitignored)

Runtime environments

Environment	Behavior
research	Scans and logs signals only. Never opens or resolves positions.
paper	Paper-only trading path. Opens and resolves only paper trades.
live	Real-money path. Only explicit `mode="live"` plus `live_ready=True` strategies can execute.

WhatsApp messaging policy

09:00 Europe/Madrid — full general summary
other scheduled runs — opened/closed delta update + alert snippet if relevant

Dashboard pipeline

uv run python scripts/update_wiki.py                        # regenerate wiki/*.md from DB
uv run python scripts/export_dashboard_data.py              # write JSON to dashboard-data/
uv run python scripts/build_dashboard.py                    # bundle into dashboard-dist/
uv run python scripts/publish_dashboard.py ~/path/to/repo --commit --push

The dashboard is snapshot-based, not live. Data freshness is shown in the top-right snapshot-age pill.

Strategies

Strategy interface

class Strategy(ABC):
    name: str
    mode: str               # preferred execution env: "paper" | "live"
    max_position_usdc: float
    min_ev_pp: float
    alert_only: bool        # if True, log/report only; runner will not open positions
    trading_enabled: bool   # explicit runner gate; keep False for alert-only
    promotable: bool        # candidate for later graduation
    live_ready: bool        # prerequisite for the live environment, not the whole policy

    # portfolio taxonomy
    edge_type: str
    time_window: str        # super_short | intraday | short | medium | long
    target_hold_min_days: float
    target_hold_max_days: float
    scan_frequency: str

    def scan(self, markets) -> list[Signal]
    def size(self, signal) -> float
    def should_exit(self, trade, price) -> bool

Active strategies (paper trading)

● ev_news

active information medium 3×/day

Thesis: Recent news contains information not yet priced into prediction markets.

Method: Claude scans top markets + news headlines, picks topics with likely EV, then estimates p̂ per market from news snippets.

max_position_usdc	$15
min_ev_pp	10 pp
hold window	7–30 days
n_topics per run	3
min_volume	$10,000
days_to_close	7–60 days
max_trades_per_run	3

● spread_arb

active structural medium daily

Thesis: In multi-outcome markets the sum of all YES prices should be ~1.0. When the sum is significantly below 1.0, buying all outcomes locks in basket EV.

Method: Scan multi-outcome events, compute basket sum, filter for clean legs, score by gap and volume.

max_position_usdc	$8 per leg
arb_threshold	≤ 0.90
min_outcomes	3
min_volume	$15,000
days_to_close	7–30 days
max_new_baskets_per_run	3

● stale_market

active information short 3×/day

Thesis: Liquid markets sometimes lag relevant news and do not reprice quickly enough.

Method: Filters liquid near-term markets, fetches recent news, uses Claude to judge whether the market appears stale. Dedupes by topic cluster.

max_position_usdc	$10
days_to_close	3–45 days
price range	0.10–0.85
min_volume	$8,000
max_trades_per_run	2

● correlated_pairs

active logical_inconsistency medium daily

Thesis: Some market pairs violate basic logical consistency (prerequisite vs downstream, broader vs narrower).

Method: Heuristic pair discovery by keyword clustering, then a Claude pass to classify the relationship and identify the cheaper implication.

max_position_usdc	$10
min_ev_pp	10 pp
relationship_gap_pp	10 pp
days_to_close	≤ 120 days
max_trades_per_run	2
hold window	3–30 days

● celebrity_tabloid

active information short

Thesis: Gossip/tabloid coverage directionally corroborates celebrity event markets before the crowd reprices.

Method: Screener for celebrity event markets (pregnancy, romance, scandal). Fails closed unless tabloid coverage corroborates the market side.

Active blocker: top-100 Gamma feed does not typically surface celebrity/tabloid markets. 0 eligible candidates seen to date.

Demoted strategies (formerly live)

● carry_rewards

paper structural long daily

Thesis: Eligible political markets offer ~4% APY Holding Rewards via full-set (YES + NO) purchases.

Method: Scan eligible long-dated political/geopolitical markets, rank by carry yield. Demoted to paper 2026-04-14 (zero signals historically).

mode="paper" · live_ready=False · filters relaxed, eligibility keyword filter added

Alert-only strategies

◆ correlated_laggard

alert-only promotable logical_inconsistency

Thesis: Liquid leader / laggard divergences across obviously related markets create short-lived arbitrage windows.

Method: Finds markets correlated by keyword, compares prices of leader vs laggard, alerts when divergence exceeds threshold.

trading_enabled=False · promotable=True · live_ready=False

◆ esport48

alert-only promotable super_short

Thesis: Esport markets expiring within 48 hours with strong liquidity/price signals can be identified with deterministic filters.

Method: Screener using deterministic liquidity/price filters + subtype tagging. No LLM pass — pure heuristic.

trading_enabled=False · promotable=True · live_ready=False

Paused / killed strategies

✕ resolution_hunter

killed resolution_lag short

Thesis: Markets sometimes stay open after real-world resolution, creating free EV.

Kill reason: -92.3% ROI on 12 closed trades. Conclusive failure — paused=True, trading_enabled=False, exposure cap set to 0.

✕ fade_certainty

killed mean_reversion

Thesis: Markets at extreme prices (>93% or <7%) are systematically overconfident and can be faded.

Kill reason: 0% win rate, -100% ROI on 6 closed trades. Too blunt — no category filtering, no news validation, static fade amounts.

Revival only as narrow subtype: political outrights only, no sports/novelty, with stale-price evidence required.

⊘ weather_edge

paused quantitative

Thesis: Open-Meteo ensemble probabilities can beat Polymarket crowd pricing on daily temperature bucket markets.

Pause reason: 45% WR, -19.5% ROI on 82 closed trades. Too many correlated bets per city/day, EV threshold too low for noisy bucket outcomes.

v2 salvage path: trade only 1–2 strongest buckets per city, raise EV threshold, avoid same-day markets.

Planned strategies

polling_vs_market — compare polling data to Polymarket political prices
base_rate — use historical base rates to identify mispriced recurring events
crypto_options_basis — cross-venue crypto options implied vol vs Polymarket
pre_event_drift — detect systematic price drift in the hours before scheduled events

Adding a new strategy

Create factory/strategies/my_strategy.py implementing Strategy base class
Register in factory/strategies/__init__.py STRATEGIES dict
Set trading_enabled=False and alert_only=True for the first paper-eval period
Run: uv run python -c "from factory.runner import run; run(dry_run=True)"
Inspect with: uv run python scripts/strategy_checks.py my_strategy --limit 10

Time Window Taxonomy

Time windows drive operational scheduling — faster buckets run every cycle, slower ones skip midday churn.

Label	Duration	Runner cadence	Current strategies
super_short	< 1 hour	Every cycle	esport48
intraday	1h – 24h	Every cycle	—
short	1–7 days	Every cycle	stale_market, celebrity_tabloid
medium	8–30 days	Can skip midday	ev_news, spread_arb, correlated_pairs, correlated_laggard
long	31+ days	Once/day	—

Open exposure is capped by both strategy-level limits and time-window-level portfolio limits.

Evaluation

Kill / keep thresholds

Metric	Kill threshold	Keep threshold
Win rate	< 30%	> 50%
ROI	< -10%	> 0%
Min trades to evaluate	5 closed trades minimum

Evaluation dimensions

By strategy
By time window
By edge type
Active vs legacy

Running the weekly eval

uv run eval/report.py

Edge types

Edge type	Description
information	Faster / better news processing than the crowd (ev_news, stale_market, celebrity_tabloid)
structural	Mathematical inconsistency baked into market structure (spread_arb)
resolution_lag	Markets staying open after real-world outcome (resolution_hunter — killed)
logical_inconsistency	Cross-market logical violations (correlated_pairs, correlated_laggard)
quantitative	External data model beats crowd calibration (weather_edge — paused)
mean_reversion	Extreme-price markets revert (fade_certainty — killed)

Replay Benchmark

The replay benchmark is a strategy-level score built from persisted signals, signal_execution_checks, and resolved-trade labels where available. It is intended as a keep/discard gate for alert-only and generated strategies, not as a replacement for realized P&L review.

Current dashboard scope

The Overview page renders the alert-only replay benchmark.
Generated-strategy benchmark rows only appear once generated strategies have actual signal evidence.
Benchmark rows are currently aggregated per strategy, not per market or per signal family.

Inputs

Directional evidence — resolved outcomes where the signal can be labeled as correct/incorrect
Execution realism — EV after slippage at $10 / $25 from Phase A fill proxies
Capacity — max size with positive EV and source-confidence quality
Uniqueness — overlap penalty for duplicate same-market / same-side signals
Coverage — sample-size floor so tiny lucky runs do not dominate

Score shape

benchmark_score =
45 × directional_score +
25 × slippage_score +
15 × capacity_score +
10 × uniqueness_score +
05 × coverage_score

How to build it

uv run python scripts/build_replay_benchmark.py --scope alert-only
uv run python scripts/build_replay_benchmark.py --scope generated
uv run python scripts/export_dashboard_data.py
uv run python scripts/build_dashboard.py

The replay benchmark is a research/control metric. It should not automatically change strategy code, scheduling, or live eligibility without a separate review gate.

Alert-Only Graduation Checklist

Promote an alert-only strategy to paper trading only after all of the following are true:

At least 10 live runs reviewed with persisted detail-table evidence.
At least 15 alerts or 30 candidate checks inspected — not anecdotal.
Top alerts look directionally sensible on manual replay, no duplicate/cluster spam.
Liquidity and fillability plausible for intended size — not relying on dead books.
Logged reasons explain why alert fired and why weaker candidates were skipped.
Initial paper cap small enough to fail safely on first promotion.

Promotion workflow

Keep trading_enabled = False while paper-eval checklist is open.
Mark strategy record with clear keep/promote decision.
Flip trading_enabled = True only after checklist complete.
Leave live_ready = False until a separate live-broker checklist exists.

Current graduation status

Strategy	Status	Promotable	Blocker
correlated_laggard	alert-only	Yes	Paper-eval checklist open — see EX-20260401-006
esport48	alert-only	Yes	Paper-eval checklist open — see EX-20260401-007
celebrity_tabloid	paper trading	—	Feed coverage — top-100 Gamma rarely surfaces celebrity markets

Phase A Execution Checks

Phase A metrics are fill proxies, not actual fills. They are a market-microstructure snapshot taken at signal time, not a live trade confirmation.

What Phase A captures

At the moment a signal fires, the runner records:

The quote price (current market YES price)
Best bid / best ask from the CLOB
Estimated fill price at $10 and $50 notional
EV after slippage at $10 and $50
Max size with positive EV
Source confidence label (direct quote vs heuristic fallback)

How to interpret

EV @ $10 / $50 pp — rough comparative metric. Do not treat as realized return.
Max +EV size — proxy capacity summary, not a live guarantee.
Source confidence — how much of the phase A layer is grounded in direct quote fields vs fallback heuristics.

Inspecting execution checks

uv run python scripts/signal_execution_checks.py --limit 20
uv run python scripts/signal_execution_checks.py --strategy spread_arb --limit 20

Metric Definitions

Status vocabularies

Run status:

Value	Meaning
ok	Run completed successfully without fatal errors
warning	Run completed but warnings/errors exceeded threshold
error	Run failed or ended in a clearly broken state
unknown	Status cannot be determined from stored data

Strategy status:

Value	Meaning
active	Currently part of the active strategy stack
paused	Intentionally disabled but still in current-era reporting context
legacy	Historical strategy, no longer part of current active stack
unknown	Cannot classify with confidence

Experiment status:

Value	Meaning
active	Currently in progress
planned	Defined but not yet active
review_due	Has reached or passed a stated review point
completed	Reached a documented conclusion
archived	Retained for history, not current focus

Exposure metrics

Field	Definition
open_exposure_active	Total open exposure from active strategies (absolute, not signed)
open_exposure_legacy	Total open exposure from legacy/paused strategies
open_position_count_active	Count of open positions from active strategies
open_position_count_legacy	Count of open positions from legacy strategies

PnL metrics

Field	Definition
realized_pnl_30d	Realized P&L from closed positions in the last 30 days
realized_pnl_all_time	Realized P&L across all available history

Realized and unrealized P&L are never combined into one headline metric.

Phase A execution fields

Field	Definition
execution_checks_30d	Count of signal execution checks in the last 30 days
strategies_with_execution_checks_30d	Distinct strategies with at least 1 check in the last 30 days
avg_ev_after_slippage_50_pp_30d	Average EV after $50 slippage across checks (30d)
avg_max_size_positive_ev_30d	Average max +EV size (USD) across checks (30d)
benchmark_top_strategy_alert_only	Best current alert-only strategy by replay benchmark score
benchmark_top_score_alert_only	Replay benchmark score of the top alert-only strategy
benchmark_signal_count_alert_only	Total signals included in the current alert-only replay benchmark snapshot

Null / unknown policy

Use null for absent scalar values
Use unknown for enum-like status fields
Use [] for genuinely empty collections
Never substitute 0 for unknown, or empty string for unknown status

Operations Runbook

Common commands

# Manual full run
uv run python -m factory.runner

# Research-only run
FACTORY_ENV=research uv run python -m factory.runner

# Live run
FACTORY_ENV=live uv run python -m factory.runner

# Safe dry run (no writes, no closes, no sends)
uv run python -c "from factory.runner import run; run(environment='paper', dry_run=True)"

# Fast safe dry run (skips ev_news, trims workloads)
uv run python -c "from factory.runner import run; run(environment='paper', dry_run=True, fast_dry_run=True)"

# Safe dry run of live policy
uv run python -c "from factory.runner import run; run(environment='live', dry_run=True, send=False, fast_dry_run=True)"

# Open book
uv run python scripts/open_positions.py
uv run python scripts/open_positions.py --top-oldest 5
uv run python scripts/open_positions.py --strategy ev_news
uv run python scripts/open_positions.py --time-window medium

# Latest run summary
uv run python scripts/latest_run.py -n 1

# Inspect decisions
uv run python scripts/inspect_decisions.py --limit 20

# Strategy-specific checks
uv run python scripts/strategy_checks.py stale_market --limit 10
uv run python scripts/strategy_checks.py correlated_laggard --limit 10
uv run python scripts/strategy_checks.py esport48 --limit 10

# Run analytics
uv run python scripts/run_analytics.py --runs 20

# Active experiments
uv run python scripts/active_experiments.py

# Weekly evaluation
uv run eval/report.py

# Replay benchmark
uv run python scripts/build_replay_benchmark.py --scope alert-only
uv run python scripts/build_replay_benchmark.py --scope generated

# Regenerate wiki from DB
uv run python scripts/update_wiki.py

# Local DB backup (keep 14 days)
uv run python scripts/backup_db.py --keep 14

Cron schedule (VM) / Launchd (Mac)

Time	Job	Description
:30	Scan phase	Fetch 500 markets, run all strategies, cache signals
:00	Execute phase	Read cached signals, size, trade, resolve, notify
09:00	WhatsApp summary	Full summary (all strategies, portfolio stats)
every 2h	Combined run	Fast pass (1000 mkts observations) + slow pass (500 mkts strategies)
every 30m	Observer	Price snapshots for 1000 markets (no LLM)
every 30m	Trade fetcher	CLOB trade data from Polymarket Data API (no auth)
10:40 / 22:40	Strategy factory	Evaluation + auto-generate strategy proposals
19:30	Live run	Live environment (stale_market, price_move_fade via dual execution)
07:30	Research	Research-only scan (no trades)
03:45	DB backup	Local + GCS (gs://pplayouts-factory-backups, 30-day retention)
04:15	Retention cleanup	Prune rows older than 730 days

SQLite database

Live at data/factory.sqlite3 (gitignored). Tables:

runs — one row per runner execution
signals — strategy signals generated each run
decisions — open/close/skip decisions per signal
signal_execution_checks — Phase A fill proxies at signal time
run_logs — log entries per run
trades includes a mode column so paper and live positions are tracked separately
Trade state is SQLite-backed; data/trades.csv is exported during migration period

Dashboard publishing

uv run python scripts/update_wiki.py
uv run python scripts/export_dashboard_data.py
uv run python scripts/build_dashboard.py
uv run python scripts/publish_dashboard.py ~/path/to/dashboard-repo --commit --push

Key resources

Gamma API: https://gamma-api.polymarket.com/markets
CLOB API: https://clob.polymarket.com

Glossary

Term	Definition
p̂	Estimated true probability for a market outcome, derived by a strategy's reasoning pass
EV	Expected value — the edge a trade offers relative to the current market price, in percentage points
EV pp	EV expressed in percentage points (e.g. 12 pp EV = 12% expected edge)
Phase A	Signal-time execution check — a fill-proxy snapshot of market microstructure when a signal fires
fill proxy	An estimate of what fill price would have been, based on CLOB bid/ask data. Not an actual fill.
basket EV	In spread_arb: the guaranteed return from buying all legs in a multi-outcome market when sum < 1.0
alert-only	Strategy mode where signals are logged and reported but no positions are opened
paper trading	Strategy mode where positions are opened in the simulator but no real money is deployed
research	Runner environment that scans and logs signals only, with no position opening or resolution
live_ready	Strategy prerequisite for the live environment; not sufficient on its own without environment-policy approval
promotable	Flag indicating a strategy is a valid candidate for promotion once evidence is sufficient
source_confidence	Label indicating whether Phase A data came from a direct CLOB quote or a heuristic fallback
replay benchmark	Strategy-level composite score built from directional labels, execution realism, capacity, uniqueness, and coverage
Gamma API	Polymarket's market data API — provides the top-100 active markets feed used by all strategies
CLOB	Central Limit Order Book — Polymarket's on-chain order book, used for execution checks
DDGS	DuckDuckGo Search — used by news-based strategies to fetch recent headlines
OpenClaw	Internal tool for WhatsApp messaging via the runner summary dispatch
launchd	macOS daemon scheduler — runs the paper, live, strategy-factory, and backup jobs on their configured calendars
Karpathy pattern	Auto-generating living documentation from DB data via LLM — used by update_wiki.py

Table of Contents