Success stories are everywhere. Failure post-mortems are rare. But failures teach more than victories—they show the edges of what's possible and the assumptions that break under pressure.
This is a candid account of approaches that didn't work, debugging rabbit holes, and hard lessons learned the expensive way.
Inter-agent communication via file watchers. DC writes to a file, KALIC's watcher triggers, reads the message, responds. Simple pub/sub pattern.
Weeks spent debugging phantom missed events. The watcher would work for hours, then silently stop firing. No errors. No warnings. Just... nothing.
WM_COPYDATA messaging. Direct inter-process communication via Windows messages. Synchronous, reliable, no file system involved. The P15 Agent Bridge now uses SendMessageTimeoutW with proper timeout handling.
Entire Windows desktop freezes for 30+ seconds. Mouse frozen. Keyboard dead. Then suddenly everything resumes.
Raw SendMessage() calls to a window that's not responding. SendMessage is synchronous—it blocks until the target processes the message. If the target is hung, the sender hangs. If the sender is the UI thread, the whole desktop hangs.
// NEVER DO THIS:
SendMessage(hwnd, WM_COPYDATA, ...) // Blocks forever if target is hung
// ALWAYS DO THIS:
SendMessageTimeoutW(hwnd, WM_COPYDATA, ...,
SMTO_ABORTIFHUNG, 1000, &result) // Times out after 1 second
Three changes: (1) Use SendMessageTimeoutW instead of SendMessage, (2) Add SMTO_ABORTIFHUNG flag, (3) Execute from a worker thread, not the UI thread.
Now documented as PhiSHRI door E59.
Every KALIC session started blank. Same questions answered multiple times. Same mistakes repeated. Same solutions rediscovered.
The core issue: relying on the agent to check its own memory without enforcement.
Brain regions architecture: compartmentalized storage (frontal, temporal, parietal, cerebellum, brainstem) with semantic routing and Claude Code hooks that enforce checking before actions.
Full case study: Teaching an AI to Remember
Deploy .NET MCP server. Works. Move to new machine. Breaks. Debug for hours. Fix. Move again. Breaks differently.
// The magic incantation to silence .NET logging:
Environment.SetEnvironmentVariable("Logging__LogLevel__Default", "None");
// But this only works if set BEFORE any logging initialization...
Rewrite in Rust. No runtime. No configuration. Single binary deployment.
Everything MCP: 100MB+ .NET → 819KB Rust
Battle.net installer stuck at 5% or 45%. No error. No timeout. Just frozen.
Network analysis with SigCon6 revealed TCP connections in FIN_WAIT_1 state—the localhost connection between Agent.exe and Setup.exe. Agent sends FIN, Setup never receives it. Connection hangs forever.
Cause: Windows Defender's network inspection was interfering with localhost traffic in specific edge cases. Disabling real-time protection allowed the install to complete.
When network operations hang without errors, check TCP states with netstat -ano. Localhost isn't immune to interference. Document weird edge cases because they will happen again.
Windows Defender repeatedly detecting "VirTool:Win32/DefenderTamperingRestore" threats. Source: Windows Defender's own registry writes.
1. Defender blocked from cloud (firewall)
2. Sets DisableBlockAtFirstSeen=1 (offline fallback)
3. Downloads new signatures (different endpoint)
4. New signatures flag DisableBlockAtFirstSeen=1 as malware
5. Defender detects its own registry write
6. Resets to 0, tries cloud, blocked, sets to 1...
7. Infinite loop
Microsoft shipped a signature to protect Defender from Defender's own offline-mode behavior.
Either allow Defender cloud access, or use ConfigureDefender (AndyFul) to manage settings without triggering the tamper detection.
AutoHotkey's ControlSend doesn't work reliably with Electron apps. Keystrokes lost or duplicated.
Electron's Chromium renderer handles input differently than native Windows controls. The window hierarchy is unusual, focus management is async, and input events get swallowed.
Use clipboard + Ctrl+V instead of direct keystroke sending. Or use WM_COPYDATA for the actual data transfer and only use keystrokes for focus management.
Documented as PhiSHRI door E58.
A tool that works 99% of the time is worse than one that works 100% of the time. That 1% failure rate means you can never fully trust it.
Don't rely on remembering to check. Build systems that force the check. Hooks, pre-commit scripts, automated gates.
Edge cases that took hours to debug will happen again. Write them down. Create PhiSHRI doors. Future you will thank past you.
Some architectures are fundamentally broken. No amount of patching fixes bad foundations. Sometimes starting over is faster than continuing to debug.
Happy paths are easy. What happens when the network is down? When the target process is hung? When the disk is full? Test those.