Clustering

Enterprise-grade clustering solution for high availability, load balancing, and horizontal scaling across multiple SMTP server nodes.

Leader Election

Raft consensus algorithm for automatic leader election and cluster coordination.

Auto-Failover

Automatic session migration and failover when nodes fail or become unavailable.

Load Balancing

Multiple strategies including round-robin, least connections, and weighted distribution.

State Replication

Distributed state with configurable replication factor and consistency levels.

Secure Communication

TLS encryption and authentication between cluster nodes for secure communication.

Multi-Region

Support for geo-distributed deployments with region-aware routing and failover.

Installation

dotnet add package Zetian
dotnet add package Zetian.Clustering

Quick Start

Basic Cluster Setup

using Zetian.Server;
using Zetian.Clustering;

// Create clustered SMTP server
var server = new SmtpServerBuilder()
    .Port(25)
    .ServerName("Node-1")
    .Build();

// Enable clustering
var cluster = await server.EnableClusteringAsync(options =>
{
    options.NodeId = "node-1";
    options.ClusterPort = 7946;
    options.DiscoveryMethod = DiscoveryMethod.Multicast;
});

await server.StartAsync();

Multi-Node Configuration

// Node 1 - Primary seed node
var node1 = new SmtpServerBuilder()
    .Port(25)
    .ServerName("Node-1")
    .Build();

await node1.EnableClusteringAsync(options =>
{
    options.NodeId = "node-1";
    options.ClusterPort = 7946;
    options.Seeds = new[] { "node-2:7946", "node-3:7946" };
});

// Node 2 - Secondary node
var node2 = new SmtpServerBuilder()
    .Port(25)
    .ServerName("Node-2")
    .Build();

await node2.EnableClusteringAsync(options =>
{
    options.NodeId = "node-2";
    options.ClusterPort = 7946;
    options.Seeds = new[] { "node-1:7946", "node-3:7946" };
});

Advanced Features

Leader Election

// Configure leader election with Raft consensus
cluster.ConfigureLeaderElection(options =>
{
    options.ElectionTimeout = TimeSpan.FromSeconds(5);
    options.HeartbeatInterval = TimeSpan.FromSeconds(1);
    options.MinNodes = 3; // Minimum nodes for quorum
});

// Check if current node is leader
if (cluster.IsLeader)
{
    // Perform leader-only operations
    await cluster.DistributeConfigurationAsync(config);
}

// Subscribe to leader changes
cluster.LeaderChanged += (sender, e) =>
{
    Console.WriteLine($"New leader: {e.NewLeaderNodeId}");
};

Load Balancing Strategies

// Round-robin (default)
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.RoundRobin);

// Least connections
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.LeastConnections);

// Weighted round-robin
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.WeightedRoundRobin, 
    new LoadBalancingOptions
    {
        NodeWeights = new Dictionary<string, int>
        {
            { "node-1", 3 },  // Gets 3x traffic
            { "node-2", 1 },  // Gets 1x traffic
            { "node-3", 2 }   // Gets 2x traffic
        }
    });

// IP Hash (sticky sessions)
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.IpHash);

Session Affinity

// Configure session affinity (sticky sessions)
cluster.ConfigureAffinity(options =>
{
    options.Method = AffinityMethod.SourceIp;
    options.FailoverMode = FailoverMode.Automatic;
    options.SessionTimeout = TimeSpan.FromMinutes(30);
});

// Custom affinity resolver
cluster.SetAffinityResolver((session) =>
{
    // Fallback to IP-based routing
    return cluster.Nodes.ElementAt(
        Math.Abs(session.ClientIp.GetHashCode()) % cluster.NodeCount
    ).Id;
});

State Replication

// Configure state replication
cluster.ConfigureReplication(options =>
{
    options.ReplicationFactor = 3;
    options.ConsistencyLevel = ConsistencyLevel.Quorum;
    options.SyncMode = SyncMode.Asynchronous;
});

// Replicate custom data
await cluster.ReplicateStateAsync("key", data, options =>
{
    options.Ttl = TimeSpan.FromMinutes(5);
    options.Priority = ReplicationPriority.High;
});

Distributed Rate Limiting

// Enable distributed rate limiting
cluster.EnableDistributedRateLimiting(options =>
{
    options.SyncInterval = TimeSpan.FromSeconds(1);
    options.Algorithm = RateLimitAlgorithm.TokenBucket;
    options.GlobalLimit = 10000; // Cluster-wide limit
});

// Check rate limit across cluster
bool allowed = await cluster.CheckRateLimitAsync(
    clientIp,
    requestsPerHour: 100
);

Health Monitoring

// Configure health checks
cluster.ConfigureHealthChecks(options =>
{
    options.CheckInterval = TimeSpan.FromSeconds(10);
    options.FailureThreshold = 3;  // Mark unhealthy after 3 failures
    options.SuccessThreshold = 2;  // Mark healthy after 2 successes
});

// Get cluster health
var health = await cluster.GetHealthAsync();
Console.WriteLine($"Cluster Status: {health.Status}");
Console.WriteLine($"Healthy Nodes: {health.HealthyNodes}/{health.TotalNodes}");

// Monitor individual nodes
foreach (var node in cluster.Nodes)
{
    Console.WriteLine($"{node.Id}: {node.State} - Load: {node.CurrentLoad}");
}

Maintenance Mode

// Put node in maintenance mode
await cluster.EnterMaintenanceModeAsync(new MaintenanceOptions
{
    DrainTimeout = TimeSpan.FromMinutes(5),
    GracefulShutdown = true,
    MigrateSessions = true
});

// Check maintenance status
if (cluster.IsInMaintenance)
{
    Console.WriteLine("Node is in maintenance mode");
}

// Exit maintenance mode
await cluster.ExitMaintenanceModeAsync();

Multi-Region Deployment

// Configure for multi-region
cluster.ConfigureRegions(options =>
{
  options.CurrentRegion = "us-east";
  options.Regions = new()
  {
      new RegionConfig
      {
          Name = "us-east",
          Endpoints = new() { "node1.us-east:7946", "node2.us-east:7946" }
      },
      new RegionConfig
      {
          Name = "eu-west",
          Endpoints = new() { "node1.eu-west:7946", "node2.eu-west:7946" }
      }
  };
  options.PreferLocalRegion = true;
  options.CrossRegionTimeout = TimeSpan.FromSeconds(10);
});

Configuration Reference

PropertyTypeDefaultDescription
NodeIdstringRequiredUnique identifier for this node
ClusterPortint7946Port for cluster communication
DiscoveryMethodenumMulticastNode discovery method (Static, DNS, Multicast, Kubernetes, Consul)
Seedsstring[]EmptySeed nodes for cluster join
ReplicationFactorint3Number of replicas for state
ConsistencyLevelenumQuorumRead/write consistency (One, Two, Three, Quorum, All)
EnableEncryptionbooltrueEnable TLS encryption between nodes
HeartbeatIntervalTimeSpan1 secondInterval between heartbeat messages
ElectionTimeoutTimeSpan5 secondsTimeout for leader election
FailureDetectionTimeoutTimeSpan10 secondsTime to detect node failure

Cluster States

Node States

Initializing

Node is starting up and discovering cluster

Joining

Node is joining the cluster

Active

Node is healthy and serving traffic

Maintenance

Node is in maintenance mode

Draining

Node is draining sessions

Failed

Node has failed and is offline

Cluster States

Forming

Cluster is being formed

Healthy

All nodes are healthy and synchronized

Degraded

Some nodes are unhealthy but cluster is operational

Rebalancing

Cluster is redistributing sessions

Split Brain

Network partition detected

Failed

Cluster has lost quorum

Performance Optimization

Storage

  • • Use SSD storage for state persistence
  • • Enable compression for large state objects
  • • Configure appropriate snapshot intervals
  • • Use memory caching for hot data

Network

  • • Use dedicated network for cluster traffic
  • • Enable compression for cross-region deployments
  • • Tune batch sizes based on latency
  • • Configure appropriate timeouts

Compute

  • • Scale horizontally for CPU-intensive workloads
  • • Use appropriate replication factors
  • • Enable async replication for non-critical data
  • • Monitor and adjust thread pool sizes

Configuration

  • • Use odd number of nodes for quorum
  • • Configure health check intervals appropriately
  • • Set realistic failure detection timeouts
  • • Use weighted load balancing for heterogeneous nodes

Security Best Practices

Network Security

  • • Restrict cluster ports (7946) to member nodes only
  • • Deploy cluster nodes in a private network/VLAN
  • • Use network segmentation between clusters
  • • Enable firewall rules for cluster communication

Encryption & Authentication

  • • Always enable TLS between nodes in production
  • • Use shared secrets or certificates for node authentication
  • • Rotate cluster secrets regularly
  • • Store secrets in secure key management systems

Access Control

  • • Implement role-based access control (RBAC)
  • • Audit all cluster operations
  • • Monitor for unauthorized access attempts
  • • Use separate credentials for each node
Need help with clustering setup? Check out ourexamplesandcommunity discussions