← Back to Chronicle

Performance Deep-Dive

THE RUST OPTIMIZATION STORY // CARBYNE.EXE

The Headline Number

25,714x
Carbyne Compositor: JavaScript → Rust

30 minutes became 0.07 seconds. Same output. Same quality. Different language.

Why Performance Matters

This isn't about bragging rights. Performance determines what's possible.

A 30-minute image process means you run it once and wait. A 0.07-second process means you iterate—tweak parameters, re-run, compare, adjust. The feedback loop collapses from "go get coffee" to "instant."

"The difference between a slow tool and a fast tool isn't speed. It's whether you use it at all."
— Operational reality

Case Study: Carbyne Compositor

The Task

Apply hex mesh overlay, carbyne tint, and glowing eye effect to a 1920×1080 PNG. ~2 million pixels to process.

Version 1: JavaScript (HTML5 Canvas)

// Pixel-by-pixel processing
for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
        // Get pixel, calculate, put pixel
        // ~4 operations per pixel
        // 2M pixels × 4 ops = 8M operations
        // Single-threaded
    }
}

Result: ~30 minutes
        

JavaScript is single-threaded. Canvas pixel access is slow. No parallelization. Each pixel waits for the previous one.

Version 2: Rust (with Rayon)

// Parallel pixel processing
pixels.par_iter()
    .map(|(x, y, pixel)| {
        // Same calculation
        // But across all CPU cores simultaneously
    })
    .collect();

Result: 0.07 seconds
        
8
CPU Cores Used
~300
Lines of Rust

What Made the Difference

Case Study: Everything MCP

The Problem

File search on Windows. Explorer search takes 30-60 seconds to scan directories. Unacceptable for iterative workflows.

.NET Version

Metric Value
Binary Size 100MB+ (with runtime)
Cold Startup 2-3 seconds (JIT compilation)
Memory Baseline ~80MB
Search Time ~100ms (Everything SDK does the work)

Rust Version

Metric Value Improvement
Binary Size 819KB 99% smaller
Cold Startup ~50ms 40-60x faster
Memory Baseline ~8MB 10x smaller
Search Time ~80ms Same (SDK-bound)

The actual search is the same speed—that's limited by Everything's SDK. But everything else got faster: startup, memory, deployment.

Case Study: Git MCP

2-3s
.NET Cold Start
641ms
Rust Cold Start

Git operations themselves are fast—Git is already native code. The improvement comes entirely from eliminating .NET's JIT compilation overhead.

20 tools, all available in under a second. Compare to waiting 2-3 seconds for the runtime to initialize, per command.

The Rust Toolkit

Parallelization: Rayon

use rayon::prelude::*;

// Turn any iterator parallel with one character change
items.iter()      // sequential
items.par_iter()  // parallel

// Rayon handles:
// - Thread pool management
// - Work stealing
// - Load balancing
// - Safe concurrency
        

Image Processing: image crate

use image::{GenericImageView, RgbaImage};

let img = image::open("input.png")?;
let (width, height) = img.dimensions();
let mut rgba = img.to_rgba8();

// Direct pixel access
let pixel = rgba.get_pixel(x, y);
rgba.put_pixel(x, y, new_pixel);
        

CLI: clap

use clap::Parser;

#[derive(Parser)]
struct Args {
    #[arg(long, default_value = "30")]
    tint_strength: u8,
    
    #[arg(long)]
    eye_x: Option,
}

// Automatic argument parsing, help generation, validation
        

Progress: indicatif

use indicatif::{ProgressBar, ProgressStyle};

let pb = ProgressBar::new(100);
pb.set_style(ProgressStyle::default_bar()
    .template("{bar:40} {pos}%")?);

pb.set_position(50);
pb.finish_with_message("Done!");
        

Optimization Methodology

1. Profile First

Never optimize based on intuition. Measure. Find the actual bottleneck. Often it's not where you think.

2. Identify the Hot Path

In the compositor, 99% of time was spent in the pixel loop. Optimizing anything else was wasted effort.

3. Parallelize the Hot Path

If the hot path is embarrassingly parallel (each iteration independent), use Rayon. Free speedup proportional to core count.

4. Eliminate Allocations

Memory allocation is slow. Reuse buffers. Avoid intermediate copies. Pre-allocate known sizes.

5. Let the Compiler Help

Rust's optimizer is aggressive. Write clear code, use iterators, let LLVM do its job. --release mode is 10-100x faster than debug.

The Full Portfolio

Tool .NET Rust Size Reduction
Everything MCP 100MB+ 819KB 99.2%
Git MCP ~80MB ~3MB 96%
Multiplexor ~60MB ~2MB 97%
Screen MCP ~50MB ~1.5MB 97%
SigCon6 ~70MB ~2MB 97%

Total MCP server footprint: ~9.3MB for 6 servers, 119 tools.

Previous .NET footprint: ~460MB for the same functionality.

98%
Total Size Reduction

Key Takeaways

"The best optimization is the one you don't have to think about."
— Why Rayon is magical
Rust Performance Rayon Optimization