How We Built a Real-Time Auction Engine for Cloud Compute

When we started KubeBid, we had a simple premise: cloud compute prices should be determined by supply and demand, not arbitrary pricing tables. What we didn't fully appreciate was just how hard it would be to build a system that could handle real-time price discovery for distributed infrastructure across 40 global regions.

Today, our auction engine processes over 100,000 bids per minute, matches buyers with available capacity in under 50 milliseconds, and maintains 99.99% availability. This post is a deep dive into how we built it.

The Problem Space

Traditional cloud pricing is simple: you pick an instance type, and you pay a fixed hourly rate. AWS, GCP, and Azure have made this the default model, but it's fundamentally inefficient. Compute capacity is a perishable good—an idle server generates no revenue but still costs money to operate.

Spot instances were an attempt to address this, but they're a half-measure. Prices still aren't truly market-driven, termination policies are unpredictable, and there's no real price discovery mechanism.

We wanted to build something different: a true auction system where prices reflect real-time supply and demand, where users can set maximum bids and get matched automatically, and where the entire process happens in milliseconds.

Architecture Overview

Our auction engine is built on three core components:

Order Book: A distributed data structure that maintains all active bids and available capacity
Matching Engine: The algorithm that pairs bids with capacity and determines clearing prices
Settlement Layer: Handles the actual provisioning of resources once a match is made

┌─────────────────────────────────────────────────────────────┐
│                      Load Balancer                           │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
       ┌───────────┐   ┌───────────┐   ┌───────────┐
       │  API GW   │   │  API GW   │   │  API GW   │
       │  (us-w-2) │   │  (us-e-1) │   │  (eu-w-1) │
       └───────────┘   └───────────┘   └───────────┘
              │               │               │
              └───────────────┼───────────────┘
                              ▼
                    ┌─────────────────┐
                    │  Order Router   │
                    │   (Kafka)       │
                    └─────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
  │  Matching   │      │  Matching   │      │  Matching   │
  │  Engine     │◄────►│  Engine     │◄────►│  Engine     │
  │  (Region 1) │      │  (Region 2) │      │  (Region N) │
  └─────────────┘      └─────────────┘      └─────────────┘
         │                    │                    │
         └────────────────────┼────────────────────┘
                              ▼
                    ┌─────────────────┐
                    │   Settlement    │
                    │     Layer       │
                    └─────────────────┘

The Order Book

The order book is the heart of the system. It needs to support three operations with very different characteristics:

Insert: Add a new bid to the book (must be fast, ~1ms)
Match: Find the best available capacity for a given bid (must be very fast, <10ms)
Update: Modify capacity as resources are provisioned or released (must be consistent)

We evaluated several data structures before settling on a modified red-black tree with price-time priority. Bids are sorted first by price (descending for buy orders), then by timestamp (earlier bids get priority at the same price level).

type OrderBook struct {
    mu       sync.RWMutex
    bids     *redblack.Tree  // price-time priority
    capacity *redblack.Tree  // available resources
    index    map[string]*Order
}

func (ob *OrderBook) InsertBid(bid *Order) error {
    ob.mu.Lock()
    defer ob.mu.Unlock()

    key := PriceTimeKey{
        Price:     bid.MaxPrice,
        Timestamp: bid.CreatedAt,
        ID:        bid.ID,
    }

    ob.bids.Put(key, bid)
    ob.index[bid.ID] = bid

    // Trigger async matching
    go ob.tryMatch(bid)

    return nil
}

The tricky part is keeping the order book consistent across regions. We use a combination of Raft consensus for critical state and eventual consistency for read replicas.

The Matching Algorithm

Our matching engine uses a continuous double auction model, similar to what you'd find in financial exchanges. When a new bid comes in, we immediately try to match it against available capacity.

The algorithm is straightforward in principle:

Find all capacity that meets the bid's requirements (region, instance type, etc.)
Filter to capacity with an ask price at or below the bid's max price
Select the lowest-priced capacity (price improvement for the buyer)
Execute the match at the ask price

func (me *MatchingEngine) Match(bid *Order) (*Match, error) {
    // Find eligible capacity
    candidates := me.orderBook.FindCapacity(
        bid.Region,
        bid.InstanceType,
        bid.Quantity,
    )

    // Filter by price
    var eligible []*Capacity
    for _, c := range candidates {
        if c.AskPrice <= bid.MaxPrice {
            eligible = append(eligible, c)
        }
    }

    if len(eligible) == 0 {
        // No match - bid goes on the book
        return nil, ErrNoMatch
    }

    // Sort by price (lowest first)
    sort.Slice(eligible, func(i, j int) bool {
        return eligible[i].AskPrice < eligible[j].AskPrice
    })

    // Execute at ask price (price improvement)
    best := eligible[0]
    return &Match{
        BidID:      bid.ID,
        CapacityID: best.ID,
        Price:      best.AskPrice,  // Buyer pays ask, not max bid
        Quantity:   bid.Quantity,
    }, nil
}

One key design decision: buyers always pay the ask price, not their maximum bid. This gives buyers automatic price improvement and encourages them to bid their true maximum willingness to pay.

Handling Scale

Processing 100,000 bids per minute means handling about 1,700 bids per second at peak. This isn't particularly high by exchange standards, but the complexity comes from the distributed nature of our capacity.

We partition the order book by region. Each region has its own matching engine instance, and cross-region matches are handled through a global coordinator. This gives us:

Lower latency: Most matches are intra-region and don't require coordination
Better fault isolation: A problem in us-east-1 doesn't affect eu-west-1
Linear scaling: We can add capacity by adding regions

# Regional matching engine deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: matching-engine
  labels:
    region: us-west-2
spec:
  replicas: 3  # HA within region
  selector:
    matchLabels:
      app: matching-engine
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: matching-engine
            topologyKey: topology.kubernetes.io/zone
      containers:
      - name: engine
        image: kubebid/matching-engine:v2.4.1
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"

Price Discovery

Real-time price discovery is what makes our auction system valuable. We calculate and publish prices every second based on the current state of the order book.

The "market price" you see on our dashboard is actually a weighted average of recent trades, with more recent trades weighted more heavily. This smooths out noise while still responding quickly to supply/demand changes.

func (pd *PriceDiscovery) CalculatePrice(instanceType string) float64 {
    trades := pd.recentTrades.Get(instanceType, time.Hour)

    if len(trades) == 0 {
        return pd.basePrice[instanceType]
    }

    // Exponential weighted moving average
    var weightedSum, weightSum float64
    now := time.Now()

    for _, trade := range trades {
        age := now.Sub(trade.Timestamp).Minutes()
        weight := math.Exp(-age / 30)  // 30-minute half-life

        weightedSum += trade.Price * weight
        weightSum += weight
    }

    return weightedSum / weightSum
}

Lessons Learned

Building this system taught us several hard lessons:

1. Consistency vs. Availability Tradeoffs

In a distributed auction system, you can't have perfect consistency and perfect availability. We chose to prioritize availability for reads (showing prices) and consistency for writes (executing matches). This means prices might be slightly stale during network partitions, but we never double-allocate capacity.

2. The Long Tail Matters

Our p50 latency is 12ms, but our p99 was initially over 500ms. We spent months optimizing the long tail—it turned out to be GC pauses, lock contention, and occasional slow network calls. We now use Go's GOGC tuning, more granular locking, and aggressive timeouts.

3. Testing Distributed Systems is Hard

Unit tests weren't enough. We built an extensive chaos engineering suite that randomly kills nodes, partitions networks, and introduces latency. We run this continuously in a staging environment that mirrors production.

What's Next

We're working on several improvements:

Predictive pricing: Using ML to forecast prices and help users optimize their bidding strategies
Cross-region arbitrage: Automatically routing workloads to regions with lower prices
Futures market: Letting users lock in prices for future capacity

If you're interested in working on problems like this, we're hiring. And if you want to try the auction system yourself, sign up for free and get $50 in compute credits.

Questions or feedback? Reach out to me on Twitter @marcusrodriguez or email marcus@kubebid.io.