Unique ID Generation (Snowflake)

Unique ID Generation (Snowflake)

Introduction

Generating unique, sortable IDs at scale is a common requirement for distributed systems. Twitter's Snowflake is a well-known solution that enables decentralized, high-throughput ID generation.

Watch Video

Problem Statement

How can we generate unique, time-ordered IDs across multiple servers without central coordination, while ensuring high availability and performance?

System Requirements

  • IDs must be unique and sortable by time.
  • The system should support high throughput and low latency.
  • No single point of failure.
  • Should work across multiple data centers.

High-Level Design

Snowflake IDs are 64-bit integers composed of:

  • Timestamp (41 bits)
  • Data center ID (5 bits)
  • Machine ID (5 bits)
  • Sequence number (12 bits)

Each node generates IDs independently using its own machine and data center IDs, incrementing the sequence for each request within the same millisecond.

Key Components

  • ID Generator Service: Runs on each node, responsible for generating IDs.
  • Clock Synchronization: Ensures system clocks are accurate to avoid duplicate IDs.
  • Configuration Management: Assigns unique machine/data center IDs.

Challenges

  • Clock drift: If the system clock moves backward, it can cause duplicate IDs.
  • Sequence overflow: If more than 4096 IDs are requested in a millisecond, the generator must wait for the next millisecond.
  • Deployment: Ensuring unique machine/data center IDs across the fleet.

Conclusion

Snowflake-style ID generation is a robust solution for distributed, high-scale systems. Careful attention to clock management and configuration is essential for reliability.