1. Introduction
When you check the metrics endpoint, you will notice that there are a lot of metrics and counters.
Command
curl 127.0.0.1:9184/metrics
I believe that most of these metrics are used when just only for debug and diagnosing trivial issues by Sui Core Team and it is impossible to monitor carefully all metrics.
I highly recommend that node operators pay attention to only the key metrics presented below.
Lets’s discuss which metrics is the important for your node
2. Network
sui_network_peers
This metrics count the number of peers your node is connected to.
The number of connections should besui_network_peers > 0
.
Ifsui_network_peers = 0
, it mean that your node is unable to synchronize.
Based on my experience at devnet and testnet, the metrics is total count of connection to inbound and outbound, besides validator and fullnode.
If you don’t advertised your node, all count is connection to validator nodes.
At devnet, number of validator is 4, therefore this metrics is 4 stably.(I hope the metrics will be separated into validator connection and fullnode connection.)
3. Sync
total_transaction_certificates
This metric displays the current synced version of the node.
If this metric stops increasing, it means the node is not syncing.
However, more important metrics is sync lag, but I can’t find the metrics regarding known-highest-version.
So, I recommend you should monitor checkpoint too.
-
highest_known_checkpoint
This metrics mean the highest checkpoint advertised to your node from other nodes. -
highest_synced_checkpoint
This metrics mean the highest synced checkpoint at your node. -
highest_known_checkpoint
-highest_synced_checkpoint
You can check your node sync lag by diffhighest_known_checkpoint
andhighest_synced_checkpoint
.
If this counter is large or growing, your node has some problem to sync.
4. API
Each Full node stores and services the queries for the blockchain state and history.
Therefore, this metrics is important for user who using your Fullnode (RPC endpoint).
rpc_requests_by_route
You can monitor and track the amount of REST API traffic on your node. You can also use theroute
in the metric to monitor the types of operations.
By now, there is no metrics regarding counts the number of requests that were successfully.
I’d like to know that I can serve my RPC endpoint healthy, therefore I hope adding the metric to track the success and failure rate of the REST API.
5. Metrics for Validator
I have no experience about Sui validator.
I’ll add the knowledge if you can get the opportunity to operate Sui validator.
I believe these metrics may be important.
epoch_total_gas_reward
consensus_handler_processed_bytes
current_requests_in_flight
current_voting_right
skipped_consensus_txns
tallying_rule_scores
5. Summary
As Sui continues to grow and develop, a lot of metrics will come and improve.
These key metrics is important just now (Feb-2023).
We should discuss and improve together.