30 minutes became 0.07 seconds. Same output. Same quality. Different language.
This isn't about bragging rights. Performance determines what's possible.
A 30-minute image process means you run it once and wait. A 0.07-second process means you iterate—tweak parameters, re-run, compare, adjust. The feedback loop collapses from "go get coffee" to "instant."
Apply hex mesh overlay, carbyne tint, and glowing eye effect to a 1920×1080 PNG. ~2 million pixels to process.
// Pixel-by-pixel processing
for (let y = 0; y < height; y++) {
for (let x = 0; x < width; x++) {
// Get pixel, calculate, put pixel
// ~4 operations per pixel
// 2M pixels × 4 ops = 8M operations
// Single-threaded
}
}
Result: ~30 minutes
JavaScript is single-threaded. Canvas pixel access is slow. No parallelization. Each pixel waits for the previous one.
// Parallel pixel processing
pixels.par_iter()
.map(|(x, y, pixel)| {
// Same calculation
// But across all CPU cores simultaneously
})
.collect();
Result: 0.07 seconds
.par_iter() distributes work across all CPU cores automaticallyFile search on Windows. Explorer search takes 30-60 seconds to scan directories. Unacceptable for iterative workflows.
| Metric | Value |
|---|---|
| Binary Size | 100MB+ (with runtime) |
| Cold Startup | 2-3 seconds (JIT compilation) |
| Memory Baseline | ~80MB |
| Search Time | ~100ms (Everything SDK does the work) |
| Metric | Value | Improvement |
|---|---|---|
| Binary Size | 819KB | 99% smaller |
| Cold Startup | ~50ms | 40-60x faster |
| Memory Baseline | ~8MB | 10x smaller |
| Search Time | ~80ms | Same (SDK-bound) |
The actual search is the same speed—that's limited by Everything's SDK. But everything else got faster: startup, memory, deployment.
Git operations themselves are fast—Git is already native code. The improvement comes entirely from eliminating .NET's JIT compilation overhead.
20 tools, all available in under a second. Compare to waiting 2-3 seconds for the runtime to initialize, per command.
use rayon::prelude::*;
// Turn any iterator parallel with one character change
items.iter() // sequential
items.par_iter() // parallel
// Rayon handles:
// - Thread pool management
// - Work stealing
// - Load balancing
// - Safe concurrency
use image::{GenericImageView, RgbaImage};
let img = image::open("input.png")?;
let (width, height) = img.dimensions();
let mut rgba = img.to_rgba8();
// Direct pixel access
let pixel = rgba.get_pixel(x, y);
rgba.put_pixel(x, y, new_pixel);
use clap::Parser;
#[derive(Parser)]
struct Args {
#[arg(long, default_value = "30")]
tint_strength: u8,
#[arg(long)]
eye_x: Option,
}
// Automatic argument parsing, help generation, validation
use indicatif::{ProgressBar, ProgressStyle};
let pb = ProgressBar::new(100);
pb.set_style(ProgressStyle::default_bar()
.template("{bar:40} {pos}%")?);
pb.set_position(50);
pb.finish_with_message("Done!");
Never optimize based on intuition. Measure. Find the actual bottleneck. Often it's not where you think.
In the compositor, 99% of time was spent in the pixel loop. Optimizing anything else was wasted effort.
If the hot path is embarrassingly parallel (each iteration independent), use Rayon. Free speedup proportional to core count.
Memory allocation is slow. Reuse buffers. Avoid intermediate copies. Pre-allocate known sizes.
Rust's optimizer is aggressive. Write clear code, use iterators, let LLVM do its job. --release mode is 10-100x faster than debug.
| Tool | .NET | Rust | Size Reduction |
|---|---|---|---|
| Everything MCP | 100MB+ | 819KB | 99.2% |
| Git MCP | ~80MB | ~3MB | 96% |
| Multiplexor | ~60MB | ~2MB | 97% |
| Screen MCP | ~50MB | ~1.5MB | 97% |
| SigCon6 | ~70MB | ~2MB | 97% |
Total MCP server footprint: ~9.3MB for 6 servers, 119 tools.
Previous .NET footprint: ~460MB for the same functionality.