When you check the metrics endpoint, you will notice that there are a lot of metrics and counters.
I believe that most of these metrics are used when just only for debug and diagnosing trivial issues by Sui Core Team and it is impossible to monitor carefully all metrics.
I highly recommend that node operators pay attention to only the key metrics presented below.
Lets’s discuss which metrics is the important for your node
This metrics count the number of peers your node is connected to.
The number of connections should be
sui_network_peers > 0.
sui_network_peers = 0, it mean that your node is unable to synchronize.
Based on my experience at devnet and testnet, the metrics is total count of connection to inbound and outbound, besides validator and fullnode.
If you don’t advertised your node, all count is connection to validator nodes.
At devnet, number of validator is 4, therefore this metrics is 4 stably.
(I hope the metrics will be separated into validator connection and fullnode connection.)
This metric displays the current synced version of the node.
If this metric stops increasing, it means the node is not syncing.
However, more important metrics is sync lag, but I can’t find the metrics regarding known-highest-version.
So, I recommend you should monitor checkpoint too.
This metrics mean the highest checkpoint advertised to your node from other nodes.
This metrics mean the highest synced checkpoint at your node.
You can check your node sync lag by diff
If this counter is large or growing, your node has some problem to sync.
Each Full node stores and services the queries for the blockchain state and history.
Therefore, this metrics is important for user who using your Fullnode (RPC endpoint).
You can monitor and track the amount of REST API traffic on your node. You can also use the
routein the metric to monitor the types of operations.
By now, there is no metrics regarding counts the number of requests that were successfully.
I’d like to know that I can serve my RPC endpoint healthy, therefore I hope adding the metric to track the success and failure rate of the REST API.
5. Metrics for Validator
I have no experience about Sui validator.
I’ll add the knowledge if you can get the opportunity to operate Sui validator.
I believe these metrics may be important.
As Sui continues to grow and develop, a lot of metrics will come and improve.
These key metrics is important just now (Feb-2023).
We should discuss and improve together.