Performance Deep-Dive

The Headline Number

25,714x

Carbyne Compositor: JavaScript → Rust

30 minutes became 0.07 seconds. Same output. Same quality. Different language.

Why Performance Matters

This isn't about bragging rights. Performance determines what's possible.

A 30-minute image process means you run it once and wait. A 0.07-second process means you iterate—tweak parameters, re-run, compare, adjust. The feedback loop collapses from "go get coffee" to "instant."

"The difference between a slow tool and a fast tool isn't speed. It's whether you use it at all."

— Operational reality

Case Study: Carbyne Compositor

The Task

Apply hex mesh overlay, carbyne tint, and glowing eye effect to a 1920×1080 PNG. ~2 million pixels to process.

Version 1: JavaScript (HTML5 Canvas)

// Pixel-by-pixel processing
for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
        // Get pixel, calculate, put pixel
        // ~4 operations per pixel
        // 2M pixels × 4 ops = 8M operations
        // Single-threaded
    }
}

Result: ~30 minutes

JavaScript is single-threaded. Canvas pixel access is slow. No parallelization. Each pixel waits for the previous one.

Version 2: Rust (with Rayon)

// Parallel pixel processing
pixels.par_iter()
    .map(|(x, y, pixel)| {
        // Same calculation
        // But across all CPU cores simultaneously
    })
    .collect();

Result: 0.07 seconds

8

CPU Cores Used

~300

Lines of Rust

What Made the Difference

Rayon parallelization - .par_iter() distributes work across all CPU cores automatically
Native memory access - Direct pixel manipulation without JS-to-native bridges
Zero-copy operations - No intermediate buffers, no garbage collection pauses
SIMD optimization - Compiler auto-vectorizes tight loops

Case Study: Everything MCP

The Problem

File search on Windows. Explorer search takes 30-60 seconds to scan directories. Unacceptable for iterative workflows.

.NET Version

Metric	Value
Binary Size	100MB+ (with runtime)
Cold Startup	2-3 seconds (JIT compilation)
Memory Baseline	~80MB
Search Time	~100ms (Everything SDK does the work)

Rust Version

Metric	Value	Improvement
Binary Size	819KB	99% smaller
Cold Startup	~50ms	40-60x faster
Memory Baseline	~8MB	10x smaller
Search Time	~80ms	Same (SDK-bound)

The actual search is the same speed—that's limited by Everything's SDK. But everything else got faster: startup, memory, deployment.

Case Study: Git MCP

2-3s

.NET Cold Start

641ms

Rust Cold Start

Git operations themselves are fast—Git is already native code. The improvement comes entirely from eliminating .NET's JIT compilation overhead.

20 tools, all available in under a second. Compare to waiting 2-3 seconds for the runtime to initialize, per command.

The Rust Toolkit

Parallelization: Rayon

use rayon::prelude::*;

// Turn any iterator parallel with one character change
items.iter()      // sequential
items.par_iter()  // parallel

// Rayon handles:
// - Thread pool management
// - Work stealing
// - Load balancing
// - Safe concurrency

Image Processing: image crate

use image::{GenericImageView, RgbaImage};

let img = image::open("input.png")?;
let (width, height) = img.dimensions();
let mut rgba = img.to_rgba8();

// Direct pixel access
let pixel = rgba.get_pixel(x, y);
rgba.put_pixel(x, y, new_pixel);

CLI: clap

use clap::Parser;

#[derive(Parser)]
struct Args {
    #[arg(long, default_value = "30")]
    tint_strength: u8,
    
    #[arg(long)]
    eye_x: Option,
}

// Automatic argument parsing, help generation, validation

Progress: indicatif

use indicatif::{ProgressBar, ProgressStyle};

let pb = ProgressBar::new(100);
pb.set_style(ProgressStyle::default_bar()
    .template("{bar:40} {pos}%")?);

pb.set_position(50);
pb.finish_with_message("Done!");

Optimization Methodology

1. Profile First

Never optimize based on intuition. Measure. Find the actual bottleneck. Often it's not where you think.

2. Identify the Hot Path

In the compositor, 99% of time was spent in the pixel loop. Optimizing anything else was wasted effort.

3. Parallelize the Hot Path

If the hot path is embarrassingly parallel (each iteration independent), use Rayon. Free speedup proportional to core count.

4. Eliminate Allocations

Memory allocation is slow. Reuse buffers. Avoid intermediate copies. Pre-allocate known sizes.

5. Let the Compiler Help

Rust's optimizer is aggressive. Write clear code, use iterators, let LLVM do its job. --release mode is 10-100x faster than debug.

The Full Portfolio

Tool	.NET	Rust	Size Reduction
Everything MCP	100MB+	819KB	99.2%
Git MCP	~80MB	~3MB	96%
Multiplexor	~60MB	~2MB	97%
Screen MCP	~50MB	~1.5MB	97%
SigCon6	~70MB	~2MB	97%

Total MCP server footprint: ~9.3MB for 6 servers, 119 tools.

Previous .NET footprint: ~460MB for the same functionality.

98%

Total Size Reduction

Key Takeaways

Parallelization is often free - Rayon makes it trivial to use all CPU cores
Language choice matters - .NET is productive; Rust is fast. Different tools for different jobs.
Startup time compounds - Sub-second startup means tools feel instant
Binary size affects deployment - 819KB copies faster than 100MB
Profile before optimizing - Intuition lies. Measure.

"The best optimization is the one you don't have to think about."

— Why Rayon is magical

Rust Performance Rayon Optimization