Runbooks
Operational procedures for Megadrive. Each runbook covers the trigger, steps, and how to verify resolution.
Deploy & Reload
Standard procedure for pushing code changes to production. Zero-downtime — bots continue running during reload.
Routine deploy (code changes only)
- Pull latest code:
cd /root/megadrive && git pull
- Zero-downtime reload:
pm2 reload ecosystem.config.js --env production
- Verify:
pm2 status # megadrive + litestream both online curl -s localhost:3000/api/health | jq .status
First deploy / after ecosystem.config.js changes
Use pm2 start instead — required any time a new process is added (e.g. litestream sidecar) or the ecosystem file itself changes.
pm2 start ecosystem.config.js --env production
pm2 save # persist process list across reboots
reload performs a zero-downtime rolling restart and re-reads the ecosystem file. restart kills and respawns without re-reading config and doesn't register new processes. Use reload ecosystem.config.js as the default for all deploys.
Start / Stop Bots
Bots are started and stopped via webhook — the process itself keeps running.
Start a bot
curl -X POST https://your-domain/api/webhook \
-H "Content-Type: application/json" \
-d '{
"regime": "CALM",
"signal": "grid,fast_mm",
"exchange": "deribit",
"symbol": "BTC-PERPETUAL",
"name": "hatamoto"
}'
Stop a bot (cancel all orders)
curl -X POST https://your-domain/api/cancel-bot-orders \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"name": "hatamoto"}'
Stop all algo loops without cancelling orders
Send a regime-only payload with no signal field. Components keep their orders but stop the algo loop on next webhook.
Cancel all orders from a TradingView alert
Use the cancel_all signal. Cancels every open order on all exchanges and stops all algo loops. Does not change the regime — send a regime update separately if you also want to enter BLACK.
{
"signal": "cancel_all",
"exchange": "deribit",
"symbol": "BTC-PERPETUAL",
"name": "hatamoto"
}
Clean restart — cancel then re-enter in one alert
Comma-separate signals to run them sequentially in a single payload. cancel_all awaits full exchange confirmation before start_bot fires, so new orders are never placed on top of old ones.
{
"regime": "CALM",
"anchorPrice": {{close}},
"signal": "cancel_all,start_bot",
"exchange": "deribit",
"symbol": "BTC-PERPETUAL",
"name": "hatamoto"
}
The regime and anchor are applied first (before any signals run), so the bot starts centred on the bar close. Works for partial restarts too — e.g. "cancel_all,grid,fast_mm" to reset only specific components.
Webhook Signal Reference
All signals are sent as JSON to POST /trade. Multiple signals can be combined as a comma-separated list: "signal": "grid,fast_mm,dip_buyer".
| Signal | Routes to | Notes |
|---|---|---|
start_bot | All regime-gated components | Starts everything enabled in bot config |
grid | Grid maker component | Self-centres on live mid at startup |
main_mm | Main market maker | Primary spread capture |
fast_mm | Fast market maker | Tight spread, high frequency |
slow_mm | Slow market maker | Wide spread, patient fills |
dip_buyer | Dip buyer component | Scaled bids below anchor |
min_order | Min order component | Position-floor trailing bid — only buys when position < minSize |
hedge | Options hedge component | Opens put or call on Deribit options |
ladder | Scaled order | One-shot buy ladder, params in payload |
tp / sell_ladder | Ratcheting sell ladder | Standalone TP ladder — distinct from the bot takeProfit component |
threshold_tp | Threshold TP | Market sells X% when position > threshold |
options_long_call | Options open — call | Auto-resolves expiry + strike from spot |
options_long_put | Options open — put | Auto-resolves expiry + strike from spot |
options_trim | Options partial close | Params: instrument (opt), contracts or pct. Limit sell at mark price. |
cancel_all | Cancel all + stop algos | No regime change. Use for emergency stop from TV alert. |
Bot Components Reference
Components are configured per bot on the Config page and started via start_bot. All are regime-gated — buy-side components respect the current buyMultiplier.
| Component | What it does |
|---|---|
grid | Symmetric buy/sell grid. Auto-reprices when price drifts beyond driftThresholdPct. |
main / fast / slow | Market makers at different spread widths and order counts. |
dipBuyer | Scaled buy ladder at configurable % below anchor. Re-bids after fills. |
minOrder | Position-floor trailing bid. Polls position every checkInterval seconds; only places a trailing limit when position is below minSize, buying the exact gap. |
takeProfit | Trailing high-watermark TP. Arms when P&L ≥ triggerPct above avg entry; trails HWM; fires a market sell on trailOffset% pullback; re-arms after cooldownMins. |
thresholdTp | Size-based trim. Market sells sellPct% when position exceeds threshold contracts. No price trigger. |
hedge | Opens a Deribit put or call. Auto-resolves spot, expiry (nearest Friday ≥ 7 days), and strike from otmPct. Snaps to tick size. |
Risk & Kill Switch Config
All risk thresholds are set on the Config page → Risk Settings panel (or directly in config/local.json). Changes require a restart.
| Setting | Key | What it does |
|---|---|---|
| Kill Loss (BTC) | riskLimits.killLossBtc | Fires BLACK + cancel-all + Telegram when equity drops this many BTC below the all-time HWM. Set to 0 to disable. |
| HWM Alert % | riskLimits.hwmAlertPct | Sends a Telegram alert (no kill) when drawdown from persistent HWM exceeds this %. One alert per day max. |
| CB Drop % | circuitBreaker.dropPct | Local circuit breaker — triggers BLACK if price drops this % within the rolling window. 0 = disabled. |
| CB Window (min) | circuitBreaker.windowMins | Rolling window for the circuit breaker check (default 15 min). |
| Funding Warn % | fundingGate.warnPct | Telegram warning when 8h funding rate exceeds this % (default 0.03). One alert per 6h. |
| Funding Extreme % | fundingGate.pausePct | Telegram extreme alert when 8h funding rate exceeds this % (default 0.10). |
bot_state — survives PM2 restarts. To reset the baseline (e.g. after intentional drawdown), clear it: sqlite3 data/megadrive.db "DELETE FROM bot_state WHERE key='hwmEquity';" then restart.
Regime Classifier
The local vol classifier polls Deribit candles on a configurable interval and automatically sets the regime based on realised volatility, funding rate, and drawdown — without requiring a TradingView alert.
Enable / disable
In config/local.json:
"regimeClassifier": {
"enabled": true,
"symbol": "BTC-PERPETUAL",
"candleResolution": "60", // candle size in minutes
"candleCount": 48, // candles fetched per poll
"overridePriority": false, // true = classifier wins over manual alerts
"deescalateAfterPolls": 3, // polls below threshold before downgrade
"thresholds": {
"calm": { "maxVol": 0.60 },
"transition": { "maxVol": 1.00 },
"stress": { "maxVol": 1.50 }
},
"blackVol": 1.50, // vol AND drawdown together trigger BLACK
"blackDrawdown": 0.15,
"extremeFundingRate": 0.0004 // escalates to STRESS regardless of vol
}
How it classifies
- vol24h / vol4h: annualised realised vol over the last 24h and 4h windows respectively.
- Escalation is immediate — if vol spikes the regime upgrades right away.
- De-escalation is delayed —
deescalateAfterPollsconsecutive polls below the threshold are required before downgrading. Prevents false calm during brief vol retraces. BLACKrequires both vol aboveblackVolAND drawdown aboveblackDrawdownsimultaneously. Neither alone is sufficient.- A funding rate above
extremeFundingRateescalates directly to STRESS regardless of vol — covers the contango carry-cost scenario.
Dashboard indicator (Risk card — Vol Classifier column)
- ON — classifier running, regime set automatically.
- HELD — de-escalation hysteresis active; waiting for N more polls before downgrade.
- OFF — classifier disabled; regime set by manual webhook only.
Override behaviour
By default (overridePriority: false) the classifier won't downgrade a manually set regime — if you manually switch to BLACK, the classifier respects it until vol genuinely drops. Set overridePriority: true to let the classifier override all manual regime changes.
Regime signal quality log
Every regime change — manual or automatic — is recorded in the regime_changes table with vol24h, vol4h, funding, drawdown, price, and source. The Metrics page → Regime Log shows these with outcomePct (price change to the next regime change). Use this to audit classifier performance over time.
sqlite3 data/megadrive.db \ "SELECT ts, regime, source, vol24h, vol4h, funding, reason FROM regime_changes ORDER BY ts DESC LIMIT 20;"
Testnet Staging
Run a full instance against test.deribit.com on port 3001 with an isolated SQLite database. No real funds. The dashboard shows an amber TESTNET banner when connected to a staging instance.
First-time setup
- Copy the credential template:
npm run setup:testnet # creates config/local-testnet.json from the example - Edit
config/local-testnet.json— set your Deribit testnet Client ID and Client Secret (generate at test.deribit.com → Account → API). - Optionally configure a testnet bot in the same file (see example template for shape).
Run
# Direct (development / quick test): npm run testnet # PM2 (alongside production): npm run testnet:pm2 # Logs: pm2 logs megadrive-testnet
What differs from production
| Setting | Production | Testnet |
|---|---|---|
| Port | 3000 | 3001 |
| Database | megadrive.db | megadrive-testnet.db |
| Exchange | www.deribit.com | test.deribit.com |
| Notifications | SMS / Slack / Telegram | None (suppressed) |
| Heartbeat | Configured URL | Disabled |
| Risk limits | Production values | 10k / 30k USD · 5% drawdown |
| Dashboard banner | None | Amber TESTNET bar |
| Regime classifier | Per config | Disabled by default |
config/local-testnet.json is gitignored. It loads after local.json so credentials and port in it always win. Never commit API keys.
Crash Recovery
PM2 auto-restarts on crash. On restart, Megadrive fires a Telegram alert listing bots in DB and auto-cancels any orphaned orders.
If restart loop (PM2 shows errored)
- Check recent error logs:
pm2 logs megadrive --lines 50 --err
- Common causes:
- DB locked — another process has the SQLite file open. Check with
lsof data/megadrive.db - Config parse error — syntax error in
config/local.json. Validate withnode -e "require('./config/local.json')" - API key invalid — exchange rejected credentials at startup. Check Telegram for the error, verify keys in local.json.
- Port in use — another process on port 3000. Kill with
lsof -ti:3000 | xargs kill
- DB locked — another process has the SQLite file open. Check with
- After fixing root cause, restart cleanly:
pm2 delete megadrive && pm2 start ecosystem.config.js --env production
- Verify bots resumed or resend start webhook if needed.
Kill Switch Triggered
The kill switch fires when equity drops more than killLossBtc BTC below the persistent all-time HWM (Config → Risk Settings). When triggered: all open orders cancelled, all algo loops stopped, regime forced to BLACK, Telegram alert sent. The state survives PM2 restarts — a reload is required to reset it.
Indicators
- Telegram: "🚨 KILL SWITCH: drawdown X.XXXX BTC from HWM…"
/api/healthreturns"killTriggered": true- Dashboard regime badge shows BLACK
Recovery procedure
- Assess: review fills and equity curve on the dashboard to understand the loss event.
- If satisfied to resume, reload to clear kill state:
pm2 reload megadrive
- Send a regime update to exit BLACK before restarting bots:
{ "regime": "CALM", "name": "hatamoto", "exchange": "deribit", "symbol": "BTC-PERPETUAL" } - Resend start webhooks for each bot.
bot_state. After a deliberate drawdown (e.g. scaling down intentionally), clear it so the threshold is measured from the new lower level:sqlite3 data/megadrive.db "DELETE FROM bot_state WHERE key='hwmEquity';"Then reload — the new HWM will be set from the first equity tick after restart.
BLACK Regime
BLACK is the full-stop state. Both buy and sell multipliers are 0.0 — no new orders are placed. Used during extreme volatility or after kill switch.
Enter BLACK (manual)
curl -X POST https://your-domain/api/webhook \
-H "Content-Type: application/json" \
-d '{"regime": "BLACK", "name": "hatamoto", "exchange": "deribit", "symbol": "BTC-PERPETUAL"}'
This updates the stored regime only — running algos will see the new multipliers on their next iteration but do not cancel existing orders. To also cancel orders:
curl -X POST https://your-domain/api/cancel-bot-orders \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"name": "hatamoto"}'
Exit BLACK
Send a regime-only payload with the desired state (typically CALM or TRANSITION), then resend bot start webhooks.
{
"regime": "CALM",
"name": "hatamoto",
"exchange": "deribit",
"symbol": "BTC-PERPETUAL"
}
| Regime | Buy × | Sell × | Use when |
|---|---|---|---|
CALM | 1.0 | 1.0 | Normal market |
TRANSITION | 0.5 | 0.8 | Elevated vol, cautious re-entry |
STRESS | 0.0 | 1.0 | Sell-only / risk-off |
BLACK | 0.0 | 0.0 | Full stop |
AFTERMATH | 0.3 | 0.8 | Post-crash cautious rebuild |
Orphaned Orders
Orphaned orders exist on the exchange but have no active algo tracking them (typically after an unclean shutdown). Megadrive auto-detects and cancels them on startup.
Manual check
curl -s https://your-domain/api/consistency-check \ -H "Authorization: Bearer YOUR_TOKEN" | jq .
Manual cancel
curl -X POST https://your-domain/api/cancel-bot-orders \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "hatamoto"}'
Grid Reanchor
The grid auto-reprices when price drifts beyond driftThresholdPct. To manually force a reanchor (e.g. after a large move), stop and restart the grid component with a new anchorPrice.
- Cancel existing grid orders:
POST /api/cancel-bot-orders {"name": "hatamoto"} - Resend start with explicit anchor (or omit for live mid):
{ "signal": "grid", "name": "hatamoto", "exchange": "deribit", "symbol": "BTC-PERPETUAL", "anchorPrice": 85000 // omit to use live mid }
Missing Fills
On startup, Megadrive runs fill reconciliation — it fetches 24h of trade history from the exchange, diffs against the local fills table, and inserts anything missing with tag='reconciled'. A Telegram alert fires if gaps are found.
Verify fills table
sqlite3 data/megadrive.db \ "SELECT ts, side, price, qty, tag FROM fills ORDER BY ts DESC LIMIT 20;"
Check for reconciled fills
sqlite3 data/megadrive.db \ "SELECT COUNT(*) FROM fills WHERE tag='reconciled';"
Manual reconciliation trigger
Restart the process — reconciliation runs automatically on each startup after exchange connections are established.
Funding Rate Alert
The funding rate monitor polls BTC-PERPETUAL every 30 minutes (configurable). Deribit's current_funding is the 8-hour rate. Two thresholds:
- Warn (default 0.03% / 8h ≈ 3.3% annualised): Telegram warning sent, logged to events. One alert per 6h.
- Extreme (default 0.10% / 8h ≈ 11% annualised): escalated alert. Consider pausing the grid manually.
What to do when funding is elevated
- Check current rate on Deribit: BTC-PERPETUAL → Info → Funding Rate.
- If rate is above extreme threshold and position is large: consider switching to STRESS regime to stop new buys while letting existing sells fill.
- Optionally widen grid spreads temporarily: stop and restart the grid component with a larger
minSpread. - Funding resets every 8 hours — monitor for normalisation before resuming full activity.
Adjust thresholds
Config page → Risk Settings → Funding Warn % / Funding Extreme %. Or in config/local.json:
"fundingGate": {
"symbol": "BTC-PERPETUAL",
"warnPct": 0.03,
"pausePct": 0.10,
"checkIntervalMins": 30
}
Options Management
Options positions are opened via the hedge signal. Minimum size is 0.1 BTC (snapped up for small accounts). Use options_trim to partially close, or fire via exchange UI for a full exit.
Open a put (downside hedge)
{
"signal": "hedge",
"name": "hatamoto",
"exchange": "deribit",
"symbol": "BTC-PERPETUAL",
"params": {
"direction": "put",
"strength": 2, // 1=ATM, 2=slightly OTM, 3=OTM
"allocation": "5%" // % of equity in BTC
}
}
Partial close — options_trim
Fires a limit sell at mark price for a specified number of contracts (or a % of the current position). Auto-selects the largest open position if instrument is omitted.
{
"signal": "options_trim",
"exchange": "deribit",
"symbol": "BTC-PERPETUAL",
"name": "hatamoto",
"params": {
"pct": 50 // close 50% of the largest open position
}
}
{
"signal": "options_trim",
"exchange": "deribit",
"symbol": "BTC-PERPETUAL",
"name": "hatamoto",
"params": {
"instrument": "BTC-25APR25-75000-P",
"contracts": 0.3 // close exactly 0.3 BTC; takes priority over pct
}
}
| Param | Type | Notes |
|---|---|---|
instrument | string | Full Deribit name (e.g. BTC-25APR25-75000-P). Omit to auto-select largest position. |
contracts | number | BTC contract size to close. Takes priority over pct. |
pct | number | % of current position to close (e.g. 50 = 50%). Used if contracts not set. |
currency | string | BTC (default) or USDC. Only used when auto-selecting instrument. |
Close size is snapped to the 0.1 BTC Deribit minimum. The order is capped at the current position size.
Options type → bot aggressiveness
When an options position is open, a second layer of multipliers stacks on top of the regime multipliers to adjust MM aggressiveness. Configure in config/local.json:
"optionsMultipliers": {
"put": { "buyMultiplier": 1.2, "sellMultiplier": 0.8 }, // puts open → buy more, sell less
"call": { "buyMultiplier": 0.8, "sellMultiplier": 1.2 }, // calls open → sell more, buy less
"none": { "buyMultiplier": 1.0, "sellMultiplier": 1.0 }
}
The multipliers are applied multiplicatively: if the TRANSITION regime has buyMultiplier: 0.5 and a put is open with buyMultiplier: 1.2, the effective buy multiplier is 0.5 × 1.2 = 0.60. A zero from the regime side (e.g. STRESS buy = 0.0) is never un-gated by the options multiplier.
Check open options positions
sqlite3 data/megadrive.db \ "SELECT symbol, side, qty, avg_entry FROM positions WHERE symbol LIKE 'BTC-%-%';"
Adverse Selection
Adverse selection measures whether the market consistently moves against you after your fills — i.e., you buy just before a drop or sell just before a rise. This indicates informed counterparties are picking off your quotes.
How to read it (Metrics page → Adverse Selection card)
- Click Load. The card fetches the selected fill window + 1h candles from Deribit REST (~1s).
- Read the Portfolio AS Score and % Adverse Fills in the aggregate row.
- Check per-component scores to identify which strategy is being picked off most.
Thresholds
| AS Score | Colour | Meaning |
|---|---|---|
| < −0.02% | Green | Favorable — fills are slightly predictive of good direction (unusual for MMs) |
| −0.02% – +0.05% | Grey | Neutral — fills are random w.r.t. future price. Normal. |
| +0.05% – +0.20% | Amber | Mild adverse selection — worth monitoring |
| > +0.20% | Red | Significant — consider widening spread, reducing allocation, or pausing |
What to do when AS is elevated
- Check whether the adverse period corresponds to a regime (TRANSITION / STRESS) where trend-following dominated — this is normal in trending markets.
- Try a longer horizon (4h or 8h) to see if the signal reverses — short-term adverse selection can mean you're providing liquidity at good levels.
- If AS is persistently high on a single component across multiple windows, widen its spread to capture more edge and reduce fill rate.
- Check inventory skew config — if
inventorySkewFactoris low, the MM may not be adjusting quotes to lean away from the imbalanced side.
Daily P&L Report
Fires automatically at 00:01 UTC via Telegram. Consolidates fills and equity snapshots for the previous calendar day into portfolio_daily and sends a digest.
Report format
📅 Daily P&L — 2026-04-10 PnL: +0.0042 BTC Fees: -0.0003 BTC Net: +0.0039 BTC Balance: 1.2540 BTC Fills: 87 Funding: -0.0001 BTC ← only shown when non-zero Unreal.: +0.0012 BTC (open pos) ← from latest WS account snapshot
Fields
- PnL — equity delta: end-of-day balance minus previous day's balance
- Fees — actual Deribit fees from WS fill events (falls back to 0.05% taker estimate if not recorded)
- Net — PnL minus fees
- Fills — total fill count for the day
- Funding — funding payments (when recorded; currently manual or via future automation)
- Unreal. — unrealised P&L from the latest WS
user.portfoliosnapshot
Query historical daily rows
sqlite3 data/megadrive.db \ "SELECT trade_date, realized_pnl, fees, balance FROM portfolio_daily ORDER BY trade_date DESC LIMIT 30;"
DB Backup & Restore
Litestream continuously replicates data/megadrive.db to Cloudflare R2. WAL frames are streamed in near-real-time; hourly snapshots provide restore points.
Verify replication is running
pm2 status litestream # should show 'online'
pm2 logs litestream --lines 20
List available restore points
litestream snapshots -config litestream.yml
Restore to latest
pm2 stop megadrive litestream litestream restore -config litestream.yml data/megadrive.db pm2 start ecosystem.config.js --env production
Restore to a specific point in time
pm2 stop megadrive litestream litestream restore -config litestream.yml \ -timestamp "2026-04-01T10:00:00Z" \ data/megadrive.db pm2 start ecosystem.config.js --env production
R2_ACCOUNT_ID, R2_BUCKET, LITESTREAM_ACCESS_KEY_ID, LITESTREAM_SECRET_ACCESS_KEY. Set in /etc/environment or .env.
PM2 Reference
| Command | Description |
|---|---|
pm2 status | List all processes with uptime and restart count |
pm2 reload ecosystem.config.js --env production | Zero-downtime reload — use for all routine deploys |
pm2 start ecosystem.config.js --env production | First start or after ecosystem.config.js changes |
pm2 restart megadrive | Hard restart (brief downtime) — use only for debug |
pm2 stop megadrive | Stop process (keeps in process list) |
pm2 delete megadrive | Remove from process list entirely |
pm2 save | Persist current process list for auto-start on reboot |
pm2 startup | Generate systemd unit for PM2 itself |
pm2 monit | Live CPU/memory dashboard in terminal |
pm2 logs megadrive --lines 100 | Stream recent logs |
pm2 logs megadrive --err --lines 50 | Errors only |
reload spawns a new process before killing the old one (zero downtime, re-reads ecosystem config). restart kills then respawns (brief gap, does not re-read ecosystem config). Always prefer reload ecosystem.config.js.
Log Locations
| File | Contents |
|---|---|
logs/out.log | All stdout — trade events, fills, algo loops |
logs/error.log | Stderr — crashes, unhandled rejections |
logs/litestream.log | Litestream replication events |
data/megadrive.db | SQLite — fills, equity curve, active bots, events |
Useful SQLite queries
# Recent fills sqlite3 data/megadrive.db "SELECT ts,side,price,qty,fee,tag FROM fills ORDER BY ts DESC LIMIT 20;" # Active bots sqlite3 data/megadrive.db "SELECT * FROM active_bots;" # Equity last 24h sqlite3 data/megadrive.db "SELECT ts,equity FROM equity_curve WHERE ts > datetime('now','-1 day') ORDER BY ts DESC;" # Recent events / alerts sqlite3 data/megadrive.db "SELECT ts,level,msg FROM events ORDER BY ts DESC LIMIT 30;"