Emergency Procedures
This page describes how to react in a network-wide emergency (funds-at-risk).
Last updated
This page describes how to react in a network-wide emergency (funds-at-risk).
Last updated
This document outlines the procedures for Node Operators to respond to network-wide emergencies, such as funds-at-risk scenarios or critical network attacks, on THORChain’s Mainnet. THORChain is a decentralized, permissionless cross-chain liquidity protocol, and Node Operators play a critical role in maintaining network security and integrity. These procedures ensure rapid, coordinated, and secure responses while preserving the network’s impartiality and resistance to capture.
See the Additional Resources for more in-depth information and guides.
Rapid Response: Node Operators must act swiftly to secure the network when a funds-at-risk bug or attack is detected.
Pseudo-Anonymity: Operators should avoid revealing personal information, even during emergencies, to maintain network neutrality and security. Use tools like make relay to communicate anonymously via the THORChain Dev Discord.
Decentralized Governance: The removal of admin mimir means network decisions, such as halts or parameter changes, are now managed through or Mimir overrides initiated by nodes.
Mainnet Context: THORChain operates on Mainnet, with robust mechanisms like node churn, Threshold Signature Scheme (TSS), and Bifrost for cross-chain security.
If you discover a bug that poses a risk to funds or network stability, follow these steps:
Immediate Notification: Directly message the THORChain team or admins via the Dev Discord (tag @thorsec if needed) or other secure channels. Alternatively, submit the bug through the formal for evaluation.
Bug Report Details: Include the following in your report:
A clear description of the bug or vulnerability.
Steps to reproduce the issue, if applicable.
Potential impact (e.g., funds at risk, network disruption).
Any relevant logs or evidence from your THORNode.
Bounty Program: If the bug is verified as critical, you may be eligible for a bounty proportional to its severity. Do not disclose the bug publicly until it is resolved to prevent exploitation.
Emergencies are classified based on severity, with corresponding actions for Node Operators:
Description: A vulnerability or attack threatens the security of funds in liquidity pools or vaults.
Actions:
Node Voting: Propose and vote on emergency actions, such as Mimir overrides, to adjust network parameters. This will be discussed in the Dev Discord #mainnet channel
channel. Voting requires consensus among active nodes.
Use thornode tx thorchain mimir <key> <value> --from <node-address>
to submit Mimir changes.
Coordinate with Team: Work with the core team and other operators to verify the threat and deploy patches. Avoid public disclosure to prevent panic or exploitation.
Description: A bug or attack disrupts network operations (e.g., chain sync issues, high slash points) but does not directly threaten funds.
Actions:
Assess Node Status: Check your node’s health using make status to ensure it is synced and operational. Review slash points and chain sync status via Grafana or Prometheus dashboards.
Vote on Parameters: If required, Propose Mimir changes to adjust network parameters (e.g., reduce ChurnInterval
to stabilize the network).
Restart Services: If your node is affected, restart services using make restart
or reset the node with make reset for unrecoverable issues (note: this is destructive and resets chain data).
Monitor Logs: Access logs via Grafana’s Loki interface to diagnose issues. Select the relevant service (e.g., thornode/bifrost) in the Log browser.
Description: Non-critical issues, such as minor performance degradation or isolated node failures.
Actions:
Diagnose Locally: Check your node’s metrics (CPU, memory, disk) using Prometheus or Kubernetes dashboards.
Apply Updates: Deploy patches or updates to your THORNode services using Helm charts from the node-launcher repository.
THORChain supports network halts to pause operations during critical emergencies:
Voting on Halts: Nodes can vote to approve or extend halts via the node voting mechanism. A supermajority is required for consensus.
Resuming Operations: Once the threat is mitigated, nodes vote to lift the halt using Mimir overrides or resume normal operations.
Backups: Ensure your THORNode’s persistent volumes are backed up via your Kubernetes provider (e.g., AWS, Google Cloud or bare-metal). Regularly verify backups to protect against data loss.
Recovery: In case of node failure, restore from backups or resync your node. Avoid using make reset or hard-reset-thornodes
unless absolutely necessary, as it deletes chain data.
Network upgrades are critical for patching vulnerabilities or improving protocol performance during emergencies. All active nodes must run the updated version for the network to proceed with an upgrade. The process can be managed in three ways:
Natural Upgrade: Versions are proposed in the Dev Discord, nodes update their software, and the network naturally churns out nodes running older versions over several days via the regular churn process.
Assertive Upgrade: Once a supermajority of nodes has upgraded, demonstrating acceptance, operators can vote to ban nodes running outdated versions. Banned nodes are churned out, removed from the Threshold Signature Scheme (TSS), and ejected from the consensus set. These nodes must fully leave, destroy their setup, and redeploy a new node to rejoin.
Use node voting (Node Voting) to propose and approve banning outdated nodes.
Forced Upgrade (Hard Fork): In time-critical scenarios, a hard fork may be initiated to exclude old nodes. This carries significant risks, such as consensus failures or network instability, and should be a last resort.
Coordinate via the Dev Discord and use Alerting for THORNodes to monitor fork outcomes.
Best Practices for Upgrades:
Deploy updates using Helm charts from the Node Launcher Repository.
Ensure backups are current before upgrading, as described in Restore Validator Backup and THORNode Snapshot Recovery and Storage Management.
Monitor node sync and health post-upgrade using Vǫrðr Monitoring and Prometheus dashboards.
Verify multi-validator cluster stability if applicable, as outlined in Multi-Validator Cluster Setup.
Stay Synced: Ensure your node is fully synced with all connected chains (e.g., Bitcoin, Ethereum) before taking action. Unsynchronized nodes may accrue slash points.
Secure Keys: Protect your operator key and node mnemonic. Loss of the operator key results in loss of bond access, and loss of the validator key may brick your node.
Regular Updates: Keep your THORNode software and Helm charts up to date using the node-launcher repository.
Community Coordination: Engage with other operators and the core team via the Dev Discord or Community Telegram for real-time collaboration.
Initiate Network Halt: If a critical threat is confirmed, Node Operators can propose a network halt using the make halt
command. This pauses the entire network to allow time for investigation and corrective actions such as voting to pause specific chain operations (e.g., ).
Monitor and Communicate: Use monitoring tools like to check chain health and sync status. Relay critical updates anonymously via make relay
to the Dev Discord #mainnet channel
.
Report: Inform the team via Discord or the Bounty Program for tracking and future improvements. Alternatively, raise an issue within the THORChain or repositories.
Initiating a Halt: to pause signing for a specific chain (e.g., BTC, ETH). This prevents outbound transactions until the issue is resolved. All Halt controls are .
Monitoring Tools: Use for real-time chain health and sync status monitoring. Deploy Prometheus and Grafana for detailed metrics on node performance and network status.
(For THORNode deployment)
(For THORNode Sofware)