Clustering
Enterprise-grade clustering solution for high availability, load balancing, and horizontal scaling across multiple SMTP server nodes.
Leader Election
Raft consensus algorithm for automatic leader election and cluster coordination.
Auto-Failover
Automatic session migration and failover when nodes fail or become unavailable.
Load Balancing
Multiple strategies including round-robin, least connections, and weighted distribution.
State Replication
Distributed state with configurable replication factor and consistency levels.
Secure Communication
TLS encryption and authentication between cluster nodes for secure communication.
Multi-Region
Support for geo-distributed deployments with region-aware routing and failover.
Installation
dotnet add package Zetian
dotnet add package Zetian.ClusteringQuick Start
Basic Cluster Setup
using Zetian.Server;
using Zetian.Clustering;
// Create clustered SMTP server
var server = new SmtpServerBuilder()
.Port(25)
.ServerName("Node-1")
.Build();
// Enable clustering
var cluster = await server.EnableClusteringAsync(options =>
{
options.NodeId = "node-1";
options.ClusterPort = 7946;
options.DiscoveryMethod = DiscoveryMethod.Multicast;
});
await server.StartAsync();Multi-Node Configuration
// Node 1 - Primary seed node
var node1 = new SmtpServerBuilder()
.Port(25)
.ServerName("Node-1")
.Build();
await node1.EnableClusteringAsync(options =>
{
options.NodeId = "node-1";
options.ClusterPort = 7946;
options.Seeds = new[] { "node-2:7946", "node-3:7946" };
});
// Node 2 - Secondary node
var node2 = new SmtpServerBuilder()
.Port(25)
.ServerName("Node-2")
.Build();
await node2.EnableClusteringAsync(options =>
{
options.NodeId = "node-2";
options.ClusterPort = 7946;
options.Seeds = new[] { "node-1:7946", "node-3:7946" };
});Advanced Features
Leader Election
// Configure leader election with Raft consensus
cluster.ConfigureLeaderElection(options =>
{
options.ElectionTimeout = TimeSpan.FromSeconds(5);
options.HeartbeatInterval = TimeSpan.FromSeconds(1);
options.MinNodes = 3; // Minimum nodes for quorum
});
// Check if current node is leader
if (cluster.IsLeader)
{
// Perform leader-only operations
await cluster.DistributeConfigurationAsync(config);
}
// Subscribe to leader changes
cluster.LeaderChanged += (sender, e) =>
{
Console.WriteLine($"New leader: {e.NewLeaderNodeId}");
};Load Balancing Strategies
// Round-robin (default)
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.RoundRobin);
// Least connections
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.LeastConnections);
// Weighted round-robin
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.WeightedRoundRobin,
new LoadBalancingOptions
{
NodeWeights = new Dictionary<string, int>
{
{ "node-1", 3 }, // Gets 3x traffic
{ "node-2", 1 }, // Gets 1x traffic
{ "node-3", 2 } // Gets 2x traffic
}
});
// IP Hash (sticky sessions)
cluster.SetLoadBalancingStrategy(LoadBalancingStrategy.IpHash);Session Affinity
// Configure session affinity (sticky sessions)
cluster.ConfigureAffinity(options =>
{
options.Method = AffinityMethod.SourceIp;
options.FailoverMode = FailoverMode.Automatic;
options.SessionTimeout = TimeSpan.FromMinutes(30);
});
// Custom affinity resolver
cluster.SetAffinityResolver((session) =>
{
// Fallback to IP-based routing
return cluster.Nodes.ElementAt(
Math.Abs(session.ClientIp.GetHashCode()) % cluster.NodeCount
).Id;
});State Replication
// Configure state replication
cluster.ConfigureReplication(options =>
{
options.ReplicationFactor = 3;
options.ConsistencyLevel = ConsistencyLevel.Quorum;
options.SyncMode = SyncMode.Asynchronous;
});
// Replicate custom data
await cluster.ReplicateStateAsync("key", data, options =>
{
options.Ttl = TimeSpan.FromMinutes(5);
options.Priority = ReplicationPriority.High;
});Distributed Rate Limiting
// Enable distributed rate limiting
cluster.EnableDistributedRateLimiting(options =>
{
options.SyncInterval = TimeSpan.FromSeconds(1);
options.Algorithm = RateLimitAlgorithm.TokenBucket;
options.GlobalLimit = 10000; // Cluster-wide limit
});
// Check rate limit across cluster
bool allowed = await cluster.CheckRateLimitAsync(
clientIp,
requestsPerHour: 100
);Health Monitoring
// Configure health checks
cluster.ConfigureHealthChecks(options =>
{
options.CheckInterval = TimeSpan.FromSeconds(10);
options.FailureThreshold = 3; // Mark unhealthy after 3 failures
options.SuccessThreshold = 2; // Mark healthy after 2 successes
});
// Get cluster health
var health = await cluster.GetHealthAsync();
Console.WriteLine($"Cluster Status: {health.Status}");
Console.WriteLine($"Healthy Nodes: {health.HealthyNodes}/{health.TotalNodes}");
// Monitor individual nodes
foreach (var node in cluster.Nodes)
{
Console.WriteLine($"{node.Id}: {node.State} - Load: {node.CurrentLoad}");
}Maintenance Mode
// Put node in maintenance mode
await cluster.EnterMaintenanceModeAsync(new MaintenanceOptions
{
DrainTimeout = TimeSpan.FromMinutes(5),
GracefulShutdown = true,
MigrateSessions = true
});
// Check maintenance status
if (cluster.IsInMaintenance)
{
Console.WriteLine("Node is in maintenance mode");
}
// Exit maintenance mode
await cluster.ExitMaintenanceModeAsync();Multi-Region Deployment
// Configure for multi-region
cluster.ConfigureRegions(options =>
{
options.CurrentRegion = "us-east";
options.Regions = new()
{
new RegionConfig
{
Name = "us-east",
Endpoints = new() { "node1.us-east:7946", "node2.us-east:7946" }
},
new RegionConfig
{
Name = "eu-west",
Endpoints = new() { "node1.eu-west:7946", "node2.eu-west:7946" }
}
};
options.PreferLocalRegion = true;
options.CrossRegionTimeout = TimeSpan.FromSeconds(10);
});Configuration Reference
| Property | Type | Default | Description |
|---|---|---|---|
| NodeId | string | Required | Unique identifier for this node |
| ClusterPort | int | 7946 | Port for cluster communication |
| DiscoveryMethod | enum | Multicast | Node discovery method (Static, DNS, Multicast, Kubernetes, Consul) |
| Seeds | string[] | Empty | Seed nodes for cluster join |
| ReplicationFactor | int | 3 | Number of replicas for state |
| ConsistencyLevel | enum | Quorum | Read/write consistency (One, Two, Three, Quorum, All) |
| EnableEncryption | bool | true | Enable TLS encryption between nodes |
| HeartbeatInterval | TimeSpan | 1 second | Interval between heartbeat messages |
| ElectionTimeout | TimeSpan | 5 seconds | Timeout for leader election |
| FailureDetectionTimeout | TimeSpan | 10 seconds | Time to detect node failure |
Cluster States
Node States
Node is starting up and discovering cluster
Node is joining the cluster
Node is healthy and serving traffic
Node is in maintenance mode
Node is draining sessions
Node has failed and is offline
Cluster States
Cluster is being formed
All nodes are healthy and synchronized
Some nodes are unhealthy but cluster is operational
Cluster is redistributing sessions
Network partition detected
Cluster has lost quorum
Performance Optimization
Storage
- • Use SSD storage for state persistence
- • Enable compression for large state objects
- • Configure appropriate snapshot intervals
- • Use memory caching for hot data
Network
- • Use dedicated network for cluster traffic
- • Enable compression for cross-region deployments
- • Tune batch sizes based on latency
- • Configure appropriate timeouts
Compute
- • Scale horizontally for CPU-intensive workloads
- • Use appropriate replication factors
- • Enable async replication for non-critical data
- • Monitor and adjust thread pool sizes
Configuration
- • Use odd number of nodes for quorum
- • Configure health check intervals appropriately
- • Set realistic failure detection timeouts
- • Use weighted load balancing for heterogeneous nodes
Security Best Practices
Network Security
- • Restrict cluster ports (7946) to member nodes only
- • Deploy cluster nodes in a private network/VLAN
- • Use network segmentation between clusters
- • Enable firewall rules for cluster communication
Encryption & Authentication
- • Always enable TLS between nodes in production
- • Use shared secrets or certificates for node authentication
- • Rotate cluster secrets regularly
- • Store secrets in secure key management systems
Access Control
- • Implement role-based access control (RBAC)
- • Audit all cluster operations
- • Monitor for unauthorized access attempts
- • Use separate credentials for each node