Key Metrics for Fullnode

1. Introduction

When you check the metrics endpoint, you will notice that there are a lot of metrics and counters.

Command

curl 127.0.0.1:9184/metrics

I believe that most of these metrics are used when just only for debug and diagnosing trivial issues by Sui Core Team and it is impossible to monitor carefully all metrics.
I highly recommend that node operators pay attention to only the key metrics presented below.

Lets’s discuss which metrics is the important for your node :blush: :+1:


2. Network

  • sui_network_peers
    This metrics count the number of peers your node is connected to.
    The number of connections should be sui_network_peers > 0.
    If sui_network_peers = 0, it mean that your node is unable to synchronize.
    Based on my experience at devnet and testnet, the metrics is total count of connection to inbound and outbound, besides validator and fullnode.

If you don’t advertised your node, all count is connection to validator nodes.
At devnet, number of validator is 4, therefore this metrics is 4 stably.

(I hope the metrics will be separated into validator connection and fullnode connection.)

3. Sync

However, more important metrics is sync lag, but I can’t find the metrics regarding known-highest-version.
So, I recommend you should monitor checkpoint too.

  • highest_known_checkpoint
    This metrics mean the highest checkpoint advertised to your node from other nodes.

  • highest_synced_checkpoint
    This metrics mean the highest synced checkpoint at your node.

  • highest_known_checkpoint - highest_synced_checkpoint
    You can check your node sync lag by diff highest_known_checkpoint and highest_synced_checkpoint.
    If this counter is large or growing, your node has some problem to sync.

4. API

Each Full node stores and services the queries for the blockchain state and history.
Therefore, this metrics is important for user who using your Fullnode (RPC endpoint).

By now, there is no metrics regarding counts the number of requests that were successfully.
I’d like to know that I can serve my RPC endpoint healthy, therefore I hope adding the metric to track the success and failure rate of the REST API.

5. Metrics for Validator

I have no experience about Sui validator.
I’ll add the knowledge if you can get the opportunity to operate Sui validator.

I believe these metrics may be important.

  • epoch_total_gas_reward
  • consensus_handler_processed_bytes
  • current_requests_in_flight
  • current_voting_right
  • skipped_consensus_txns
  • tallying_rule_scores

5. Summary

As Sui continues to grow and develop, a lot of metrics will come and improve.
These key metrics is important just now (Feb-2023).
We should discuss and improve together. :muscle:

7 Likes

interesting information :heartpulse: :blush: :+1: :clap:

3 Likes