Back to Projects Distributed Systems Course Project

Distributed Systems Course Project

Distributed Systems Go Raft Consensus Backend

Overview

As part of my Software Engineering curriculum at CODE University of Applied Sciences, I designed and implemented a distributed key-value store that demonstrates core concepts of distributed systems including consensus, replication, and fault tolerance.

System Design

Architecture

The system consists of:

  • Multiple server nodes forming a cluster
  • Raft consensus algorithm for leader election and log replication
  • Client library for read/write operations
  • Monitoring and debugging tools

Key Features

  • Fault Tolerance: System remains operational even when minority of nodes fail
  • Consistency: Strong consistency guarantees through Raft consensus
  • Scalability: Horizontal scaling by adding more nodes
  • Partition Tolerance: Handles network partitions gracefully

Implementation Details

Raft Consensus Algorithm

Implemented the complete Raft protocol including:

  • Leader election
  • Log replication
  • Safety guarantees
  • Log compaction
type RaftNode struct {
    id          int
    state       NodeState
    currentTerm int
    votedFor    *int
    log         []LogEntry
    commitIndex int
    lastApplied int
}

Network Communication

  • gRPC for inter-node communication
  • Protocol buffers for message serialization
  • Efficient batching for log replication

Storage Layer

  • In-memory data store with periodic snapshots
  • Write-ahead logging for durability
  • Configurable persistence options

Testing Strategy

Unit Tests

  • Individual component testing
  • Mock network layer for isolation
  • Edge case coverage

Integration Tests

  • Multi-node cluster testing
  • Network partition simulation
  • Leader failure scenarios

Chaos Engineering

  • Random node failures
  • Network latency injection
  • Message dropping simulation

Performance Metrics

  • Throughput: 10,000+ operations/second
  • Latency: <10ms for committed writes
  • Recovery Time: <5 seconds after leader failure
  • Consistency: 100% under all tested scenarios

Technologies Used

  • Go for high-performance concurrent programming
  • gRPC for RPC communication
  • Protocol Buffers for serialization
  • Docker for deployment and testing
  • Prometheus for metrics

Challenges Overcome

Timing Issues

Distributed systems are notoriously difficult to debug:

  • Implemented comprehensive logging
  • Built visualization tools for state machine
  • Created deterministic test framework

Network Partitions

Handling split-brain scenarios:

  • Implemented proper quorum logic
  • Added fencing mechanisms
  • Thorough testing of edge cases

Performance Optimization

Balancing consistency and performance:

  • Batching of log entries
  • Pipelining of network operations
  • Efficient state machine implementation

Learning Outcomes

This project provided deep insights into:

  • Consensus algorithms and their practical challenges
  • Trade-offs in distributed systems (CAP theorem)
  • Importance of testing in distributed environments
  • Performance optimization in concurrent systems

Future Enhancements

  • Multi-Raft for sharding
  • Read-only replicas for scaling reads
  • Dynamic membership changes
  • Compression and more efficient snapshot mechanisms

Code Repository

The project is open-source and available on GitHub with comprehensive documentation and examples.