Praeger Reuchlin
Runbooks

Runbooks

Operational procedures for Megadrive. Each runbook covers the trigger, steps, and how to verify resolution.

Deploy & Reload

Standard procedure for pushing code changes to production. Zero-downtime — bots continue running during reload.

Routine deploy (code changes only)

  1. Pull latest code:
    cd /root/megadrive && git pull
  2. Zero-downtime reload:
    pm2 reload ecosystem.config.js --env production
  3. Verify:
    pm2 status          # megadrive + litestream both online
    curl -s localhost:3000/api/health | jq .status

First deploy / after ecosystem.config.js changes

Use pm2 start instead — required any time a new process is added (e.g. litestream sidecar) or the ecosystem file itself changes.

pm2 start ecosystem.config.js --env production
pm2 save              # persist process list across reboots
pm2 reload vs pm2 restart: reload performs a zero-downtime rolling restart and re-reads the ecosystem file. restart kills and respawns without re-reading config and doesn't register new processes. Use reload ecosystem.config.js as the default for all deploys.

Start / Stop Bots

Bots are started and stopped via webhook — the process itself keeps running.

Start a bot

curl -X POST https://your-domain/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "regime": "CALM",
    "signal": "grid,fast_mm",
    "exchange": "deribit",
    "symbol": "BTC-PERPETUAL",
    "name": "hatamoto"
  }'

Stop a bot (cancel all orders)

curl -X POST https://your-domain/api/cancel-bot-orders \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"name": "hatamoto"}'

Stop all algo loops without cancelling orders

Send a regime-only payload with no signal field. Components keep their orders but stop the algo loop on next webhook.

Cancel all orders from a TradingView alert

Use the cancel_all signal. Cancels every open order on all exchanges and stops all algo loops. Does not change the regime — send a regime update separately if you also want to enter BLACK.

{
  "signal": "cancel_all",
  "exchange": "deribit",
  "symbol": "BTC-PERPETUAL",
  "name": "hatamoto"
}

Clean restart — cancel then re-enter in one alert

Comma-separate signals to run them sequentially in a single payload. cancel_all awaits full exchange confirmation before start_bot fires, so new orders are never placed on top of old ones.

{
  "regime": "CALM",
  "anchorPrice": {{close}},
  "signal": "cancel_all,start_bot",
  "exchange": "deribit",
  "symbol": "BTC-PERPETUAL",
  "name": "hatamoto"
}

The regime and anchor are applied first (before any signals run), so the bot starts centred on the bar close. Works for partial restarts too — e.g. "cancel_all,grid,fast_mm" to reset only specific components.

Webhook Signal Reference

All signals are sent as JSON to POST /trade. Multiple signals can be combined as a comma-separated list: "signal": "grid,fast_mm,dip_buyer".

SignalRoutes toNotes
start_botAll regime-gated componentsStarts everything enabled in bot config
gridGrid maker componentSelf-centres on live mid at startup
main_mmMain market makerPrimary spread capture
fast_mmFast market makerTight spread, high frequency
slow_mmSlow market makerWide spread, patient fills
dip_buyerDip buyer componentScaled bids below anchor
min_orderMin order componentPosition-floor trailing bid — only buys when position < minSize
hedgeOptions hedge componentOpens put or call on Deribit options
ladderScaled orderOne-shot buy ladder, params in payload
tp / sell_ladderRatcheting sell ladderStandalone TP ladder — distinct from the bot takeProfit component
threshold_tpThreshold TPMarket sells X% when position > threshold
options_long_callOptions open — callAuto-resolves expiry + strike from spot
options_long_putOptions open — putAuto-resolves expiry + strike from spot
options_trimOptions partial closeParams: instrument (opt), contracts or pct. Limit sell at mark price.
cancel_allCancel all + stop algosNo regime change. Use for emergency stop from TV alert.
Regime-only update (no signal field): updates the stored regime and anchor price without starting or stopping any components. Use this to change regime multipliers without touching running bots.

Bot Components Reference

Components are configured per bot on the Config page and started via start_bot. All are regime-gated — buy-side components respect the current buyMultiplier.

ComponentWhat it does
gridSymmetric buy/sell grid. Auto-reprices when price drifts beyond driftThresholdPct.
main / fast / slowMarket makers at different spread widths and order counts.
dipBuyerScaled buy ladder at configurable % below anchor. Re-bids after fills.
minOrderPosition-floor trailing bid. Polls position every checkInterval seconds; only places a trailing limit when position is below minSize, buying the exact gap.
takeProfitTrailing high-watermark TP. Arms when P&L ≥ triggerPct above avg entry; trails HWM; fires a market sell on trailOffset% pullback; re-arms after cooldownMins.
thresholdTpSize-based trim. Market sells sellPct% when position exceeds threshold contracts. No price trigger.
hedgeOpens a Deribit put or call. Auto-resolves spot, expiry (nearest Friday ≥ 7 days), and strike from otmPct. Snaps to tick size.

Risk & Kill Switch Config

All risk thresholds are set on the Config page → Risk Settings panel (or directly in config/local.json). Changes require a restart.

SettingKeyWhat it does
Kill Loss (BTC)riskLimits.killLossBtcFires BLACK + cancel-all + Telegram when equity drops this many BTC below the all-time HWM. Set to 0 to disable.
HWM Alert %riskLimits.hwmAlertPctSends a Telegram alert (no kill) when drawdown from persistent HWM exceeds this %. One alert per day max.
CB Drop %circuitBreaker.dropPctLocal circuit breaker — triggers BLACK if price drops this % within the rolling window. 0 = disabled.
CB Window (min)circuitBreaker.windowMinsRolling window for the circuit breaker check (default 15 min).
Funding Warn %fundingGate.warnPctTelegram warning when 8h funding rate exceeds this % (default 0.03). One alert per 6h.
Funding Extreme %fundingGate.pausePctTelegram extreme alert when 8h funding rate exceeds this % (default 0.10).
Kill switch baseline: uses the persistent all-time HWM stored in bot_state — survives PM2 restarts. To reset the baseline (e.g. after intentional drawdown), clear it: sqlite3 data/megadrive.db "DELETE FROM bot_state WHERE key='hwmEquity';" then restart.

Regime Classifier

The local vol classifier polls Deribit candles on a configurable interval and automatically sets the regime based on realised volatility, funding rate, and drawdown — without requiring a TradingView alert.

Enable / disable

In config/local.json:

"regimeClassifier": {
  "enabled": true,
  "symbol":              "BTC-PERPETUAL",
  "candleResolution":    "60",   // candle size in minutes
  "candleCount":         48,     // candles fetched per poll
  "overridePriority":    false,  // true = classifier wins over manual alerts
  "deescalateAfterPolls": 3,     // polls below threshold before downgrade
  "thresholds": {
    "calm":       { "maxVol": 0.60 },
    "transition": { "maxVol": 1.00 },
    "stress":     { "maxVol": 1.50 }
  },
  "blackVol":            1.50,   // vol AND drawdown together trigger BLACK
  "blackDrawdown":       0.15,
  "extremeFundingRate":  0.0004  // escalates to STRESS regardless of vol
}

How it classifies

Dashboard indicator (Risk card — Vol Classifier column)

Override behaviour

By default (overridePriority: false) the classifier won't downgrade a manually set regime — if you manually switch to BLACK, the classifier respects it until vol genuinely drops. Set overridePriority: true to let the classifier override all manual regime changes.

Regime signal quality log

Every regime change — manual or automatic — is recorded in the regime_changes table with vol24h, vol4h, funding, drawdown, price, and source. The Metrics page → Regime Log shows these with outcomePct (price change to the next regime change). Use this to audit classifier performance over time.

sqlite3 data/megadrive.db \
  "SELECT ts, regime, source, vol24h, vol4h, funding, reason FROM regime_changes ORDER BY ts DESC LIMIT 20;"

Testnet Staging

Run a full instance against test.deribit.com on port 3001 with an isolated SQLite database. No real funds. The dashboard shows an amber TESTNET banner when connected to a staging instance.

First-time setup

  1. Copy the credential template:
    npm run setup:testnet
    # creates config/local-testnet.json from the example
  2. Edit config/local-testnet.json — set your Deribit testnet Client ID and Client Secret (generate at test.deribit.com → Account → API).
  3. Optionally configure a testnet bot in the same file (see example template for shape).

Run

# Direct (development / quick test):
npm run testnet

# PM2 (alongside production):
npm run testnet:pm2

# Logs:
pm2 logs megadrive-testnet

What differs from production

SettingProductionTestnet
Port30003001
Databasemegadrive.dbmegadrive-testnet.db
Exchangewww.deribit.comtest.deribit.com
NotificationsSMS / Slack / TelegramNone (suppressed)
HeartbeatConfigured URLDisabled
Risk limitsProduction values10k / 30k USD · 5% drawdown
Dashboard bannerNoneAmber TESTNET bar
Regime classifierPer configDisabled by default
config/local-testnet.json is gitignored. It loads after local.json so credentials and port in it always win. Never commit API keys.

Crash Recovery

PM2 auto-restarts on crash. On restart, Megadrive fires a Telegram alert listing bots in DB and auto-cancels any orphaned orders.

If restart loop (PM2 shows errored)

  1. Check recent error logs:
    pm2 logs megadrive --lines 50 --err
  2. Common causes:
    • DB locked — another process has the SQLite file open. Check with lsof data/megadrive.db
    • Config parse error — syntax error in config/local.json. Validate with node -e "require('./config/local.json')"
    • API key invalid — exchange rejected credentials at startup. Check Telegram for the error, verify keys in local.json.
    • Port in use — another process on port 3000. Kill with lsof -ti:3000 | xargs kill
  3. After fixing root cause, restart cleanly:
    pm2 delete megadrive && pm2 start ecosystem.config.js --env production
  4. Verify bots resumed or resend start webhook if needed.
On restart, orphaned orders (from the previous run) are auto-cancelled and removed from active_bots. Re-send start webhooks for each bot after recovery.

Kill Switch Triggered

The kill switch fires when equity drops more than killLossBtc BTC below the persistent all-time HWM (Config → Risk Settings). When triggered: all open orders cancelled, all algo loops stopped, regime forced to BLACK, Telegram alert sent. The state survives PM2 restarts — a reload is required to reset it.

Indicators

Recovery procedure

  1. Assess: review fills and equity curve on the dashboard to understand the loss event.
  2. If satisfied to resume, reload to clear kill state:
    pm2 reload megadrive
  3. Send a regime update to exit BLACK before restarting bots:
    {
      "regime": "CALM",
      "name": "hatamoto",
      "exchange": "deribit",
      "symbol": "BTC-PERPETUAL"
    }
  4. Resend start webhooks for each bot.
Do not resend bot start webhooks while regime is BLACK — the buy multiplier is 0.0 so grids and market makers will place no buy orders. Exit BLACK first.
Resetting the HWM baseline: the kill switch compares against the persistent HWM in bot_state. After a deliberate drawdown (e.g. scaling down intentionally), clear it so the threshold is measured from the new lower level:
sqlite3 data/megadrive.db "DELETE FROM bot_state WHERE key='hwmEquity';"
Then reload — the new HWM will be set from the first equity tick after restart.

BLACK Regime

BLACK is the full-stop state. Both buy and sell multipliers are 0.0 — no new orders are placed. Used during extreme volatility or after kill switch.

Enter BLACK (manual)

curl -X POST https://your-domain/api/webhook \
  -H "Content-Type: application/json" \
  -d '{"regime": "BLACK", "name": "hatamoto", "exchange": "deribit", "symbol": "BTC-PERPETUAL"}'

This updates the stored regime only — running algos will see the new multipliers on their next iteration but do not cancel existing orders. To also cancel orders:

curl -X POST https://your-domain/api/cancel-bot-orders \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"name": "hatamoto"}'

Exit BLACK

Send a regime-only payload with the desired state (typically CALM or TRANSITION), then resend bot start webhooks.

{
  "regime": "CALM",
  "name": "hatamoto",
  "exchange": "deribit",
  "symbol": "BTC-PERPETUAL"
}
RegimeBuy ×Sell ×Use when
CALM1.01.0Normal market
TRANSITION0.50.8Elevated vol, cautious re-entry
STRESS0.01.0Sell-only / risk-off
BLACK0.00.0Full stop
AFTERMATH0.30.8Post-crash cautious rebuild

Orphaned Orders

Orphaned orders exist on the exchange but have no active algo tracking them (typically after an unclean shutdown). Megadrive auto-detects and cancels them on startup.

Manual check

curl -s https://your-domain/api/consistency-check \
  -H "Authorization: Bearer YOUR_TOKEN" | jq .

Manual cancel

curl -X POST https://your-domain/api/cancel-bot-orders \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "hatamoto"}'

Grid Reanchor

The grid auto-reprices when price drifts beyond driftThresholdPct. To manually force a reanchor (e.g. after a large move), stop and restart the grid component with a new anchorPrice.

  1. Cancel existing grid orders:
    POST /api/cancel-bot-orders  {"name": "hatamoto"}
  2. Resend start with explicit anchor (or omit for live mid):
    {
      "signal": "grid",
      "name": "hatamoto",
      "exchange": "deribit",
      "symbol": "BTC-PERPETUAL",
      "anchorPrice": 85000    // omit to use live mid
    }
anchorPrice 0 or omitted = self-centering at live mid at startup. Set an explicit price only when you want the grid centred away from current mid (e.g. anchoring to a key level).

Missing Fills

On startup, Megadrive runs fill reconciliation — it fetches 24h of trade history from the exchange, diffs against the local fills table, and inserts anything missing with tag='reconciled'. A Telegram alert fires if gaps are found.

Verify fills table

sqlite3 data/megadrive.db \
  "SELECT ts, side, price, qty, tag FROM fills ORDER BY ts DESC LIMIT 20;"

Check for reconciled fills

sqlite3 data/megadrive.db \
  "SELECT COUNT(*) FROM fills WHERE tag='reconciled';"

Manual reconciliation trigger

Restart the process — reconciliation runs automatically on each startup after exchange connections are established.

Funding Rate Alert

The funding rate monitor polls BTC-PERPETUAL every 30 minutes (configurable). Deribit's current_funding is the 8-hour rate. Two thresholds:

What to do when funding is elevated

  1. Check current rate on Deribit: BTC-PERPETUAL → Info → Funding Rate.
  2. If rate is above extreme threshold and position is large: consider switching to STRESS regime to stop new buys while letting existing sells fill.
  3. Optionally widen grid spreads temporarily: stop and restart the grid component with a larger minSpread.
  4. Funding resets every 8 hours — monitor for normalisation before resuming full activity.

Adjust thresholds

Config page → Risk Settings → Funding Warn % / Funding Extreme %. Or in config/local.json:

"fundingGate": {
  "symbol":            "BTC-PERPETUAL",
  "warnPct":           0.03,
  "pausePct":          0.10,
  "checkIntervalMins": 30
}

Options Management

Options positions are opened via the hedge signal. Minimum size is 0.1 BTC (snapped up for small accounts). Use options_trim to partially close, or fire via exchange UI for a full exit.

Open a put (downside hedge)

{
  "signal": "hedge",
  "name": "hatamoto",
  "exchange": "deribit",
  "symbol": "BTC-PERPETUAL",
  "params": {
    "direction": "put",
    "strength": 2,        // 1=ATM, 2=slightly OTM, 3=OTM
    "allocation": "5%"    // % of equity in BTC
  }
}

Partial close — options_trim

Fires a limit sell at mark price for a specified number of contracts (or a % of the current position). Auto-selects the largest open position if instrument is omitted.

{
  "signal": "options_trim",
  "exchange": "deribit",
  "symbol": "BTC-PERPETUAL",
  "name": "hatamoto",
  "params": {
    "pct": 50             // close 50% of the largest open position
  }
}
{
  "signal": "options_trim",
  "exchange": "deribit",
  "symbol": "BTC-PERPETUAL",
  "name": "hatamoto",
  "params": {
    "instrument": "BTC-25APR25-75000-P",
    "contracts": 0.3      // close exactly 0.3 BTC; takes priority over pct
  }
}
ParamTypeNotes
instrumentstringFull Deribit name (e.g. BTC-25APR25-75000-P). Omit to auto-select largest position.
contractsnumberBTC contract size to close. Takes priority over pct.
pctnumber% of current position to close (e.g. 50 = 50%). Used if contracts not set.
currencystringBTC (default) or USDC. Only used when auto-selecting instrument.

Close size is snapped to the 0.1 BTC Deribit minimum. The order is capped at the current position size.

Options type → bot aggressiveness

When an options position is open, a second layer of multipliers stacks on top of the regime multipliers to adjust MM aggressiveness. Configure in config/local.json:

"optionsMultipliers": {
  "put":  { "buyMultiplier": 1.2, "sellMultiplier": 0.8 },  // puts open → buy more, sell less
  "call": { "buyMultiplier": 0.8, "sellMultiplier": 1.2 },  // calls open → sell more, buy less
  "none": { "buyMultiplier": 1.0, "sellMultiplier": 1.0 }
}

The multipliers are applied multiplicatively: if the TRANSITION regime has buyMultiplier: 0.5 and a put is open with buyMultiplier: 1.2, the effective buy multiplier is 0.5 × 1.2 = 0.60. A zero from the regime side (e.g. STRESS buy = 0.0) is never un-gated by the options multiplier.

Check open options positions

sqlite3 data/megadrive.db \
  "SELECT symbol, side, qty, avg_entry FROM positions WHERE symbol LIKE 'BTC-%-%';"
Expiry: Deribit options expire at 08:00 UTC. Unmanaged expiry results in cash settlement. Roll at least 24h before expiry if you want to maintain the hedge.

Adverse Selection

Adverse selection measures whether the market consistently moves against you after your fills — i.e., you buy just before a drop or sell just before a rise. This indicates informed counterparties are picking off your quotes.

How to read it (Metrics page → Adverse Selection card)

  1. Click Load. The card fetches the selected fill window + 1h candles from Deribit REST (~1s).
  2. Read the Portfolio AS Score and % Adverse Fills in the aggregate row.
  3. Check per-component scores to identify which strategy is being picked off most.

Thresholds

AS ScoreColourMeaning
< −0.02%GreenFavorable — fills are slightly predictive of good direction (unusual for MMs)
−0.02% – +0.05%GreyNeutral — fills are random w.r.t. future price. Normal.
+0.05% – +0.20%AmberMild adverse selection — worth monitoring
> +0.20%RedSignificant — consider widening spread, reducing allocation, or pausing

What to do when AS is elevated

  1. Check whether the adverse period corresponds to a regime (TRANSITION / STRESS) where trend-following dominated — this is normal in trending markets.
  2. Try a longer horizon (4h or 8h) to see if the signal reverses — short-term adverse selection can mean you're providing liquidity at good levels.
  3. If AS is persistently high on a single component across multiple windows, widen its spread to capture more edge and reduce fill rate.
  4. Check inventory skew config — if inventorySkewFactor is low, the MM may not be adjusting quotes to lean away from the imbalanced side.

Daily P&L Report

Fires automatically at 00:01 UTC via Telegram. Consolidates fills and equity snapshots for the previous calendar day into portfolio_daily and sends a digest.

Report format

📅 Daily P&L — 2026-04-10
PnL:      +0.0042 BTC
Fees:     -0.0003 BTC
Net:      +0.0039 BTC
Balance:  1.2540 BTC
Fills:    87
Funding:  -0.0001 BTC        ← only shown when non-zero
Unreal.:  +0.0012 BTC (open pos)  ← from latest WS account snapshot

Fields

Query historical daily rows

sqlite3 data/megadrive.db \
  "SELECT trade_date, realized_pnl, fees, balance FROM portfolio_daily ORDER BY trade_date DESC LIMIT 30;"

DB Backup & Restore

Litestream continuously replicates data/megadrive.db to Cloudflare R2. WAL frames are streamed in near-real-time; hourly snapshots provide restore points.

Verify replication is running

pm2 status litestream        # should show 'online'
pm2 logs litestream --lines 20

List available restore points

litestream snapshots -config litestream.yml

Restore to latest

pm2 stop megadrive litestream
litestream restore -config litestream.yml data/megadrive.db
pm2 start ecosystem.config.js --env production

Restore to a specific point in time

pm2 stop megadrive litestream
litestream restore -config litestream.yml \
  -timestamp "2026-04-01T10:00:00Z" \
  data/megadrive.db
pm2 start ecosystem.config.js --env production
Required env vars: R2_ACCOUNT_ID, R2_BUCKET, LITESTREAM_ACCESS_KEY_ID, LITESTREAM_SECRET_ACCESS_KEY. Set in /etc/environment or .env.

PM2 Reference

CommandDescription
pm2 statusList all processes with uptime and restart count
pm2 reload ecosystem.config.js --env productionZero-downtime reload — use for all routine deploys
pm2 start ecosystem.config.js --env productionFirst start or after ecosystem.config.js changes
pm2 restart megadriveHard restart (brief downtime) — use only for debug
pm2 stop megadriveStop process (keeps in process list)
pm2 delete megadriveRemove from process list entirely
pm2 savePersist current process list for auto-start on reboot
pm2 startupGenerate systemd unit for PM2 itself
pm2 monitLive CPU/memory dashboard in terminal
pm2 logs megadrive --lines 100Stream recent logs
pm2 logs megadrive --err --lines 50Errors only
reload vs restart: reload spawns a new process before killing the old one (zero downtime, re-reads ecosystem config). restart kills then respawns (brief gap, does not re-read ecosystem config). Always prefer reload ecosystem.config.js.

Log Locations

FileContents
logs/out.logAll stdout — trade events, fills, algo loops
logs/error.logStderr — crashes, unhandled rejections
logs/litestream.logLitestream replication events
data/megadrive.dbSQLite — fills, equity curve, active bots, events

Useful SQLite queries

# Recent fills
sqlite3 data/megadrive.db "SELECT ts,side,price,qty,fee,tag FROM fills ORDER BY ts DESC LIMIT 20;"

# Active bots
sqlite3 data/megadrive.db "SELECT * FROM active_bots;"

# Equity last 24h
sqlite3 data/megadrive.db "SELECT ts,equity FROM equity_curve WHERE ts > datetime('now','-1 day') ORDER BY ts DESC;"

# Recent events / alerts
sqlite3 data/megadrive.db "SELECT ts,level,msg FROM events ORDER BY ts DESC LIMIT 30;"
Praeger Reuchlin
Building dreams