A lot of infrastructure projects make the same mistake: they expose a capable API and then act surprised when operating the system still feels miserable.
We ran into that quickly building Familiar. The kernel could receive prompts and hand them to the agent, but that was not enough. Real use immediately split into different modes: fast interactions, long-running work, browser chat, retries, streaming, and "did it hang or is it thinking?" You cannot solve that with one endpoint and better documentation.
So the project started to grow an operating surface.
/invoke stopped being a single behavior and became a family of execution modes. Some requests are interactive and should feel live. Some are fire-and-forget. Some start synchronously but need to promote themselves to async once the runtime can tell they will not finish soon. Concretely: you fire a long request, the kernel detects it will not finish within the timeout, and it hands back a job ID instead of holding the connection open. You poll. The work continues. That sounds like implementation detail until you are on the other side of a remote call wondering whether the system lost your work. The right execution model is not an optimization. It is part of trust.
The browser chat UI came from the same pressure. Not because we wanted "yet another chat app," but because streaming changes the emotional character of an agent. A long pause with no signal feels broken. A streaming response, even before the first useful sentence lands, tells the operator the system is alive. The async job model solved the same problem from the other end: if a task is genuinely long-running, the runtime should say so explicitly and give it an ID instead of pretending every interaction belongs in the same request.
Observability followed naturally. Once you have sync work, async work, and background activity, the kernel needs to explain itself. Health, status, dispatch tracking, recent failures, and tunnel visibility stopped being ops garnish. They became the way a human—or another agent—forms a mental model of the system.
Once you support SSE in one place, async jobs in another, and a browser UI on top, every edge case becomes visible. Response flushing matters. Content length matters. Local history matters. Small bugs stop being small because they break the operator's picture of what the system is doing.
The net effect is that Familiar does not just expose an API. It presents a way of operating a remote agent: a browser surface for live conversation, async jobs for heavier work, health and status for trust, and enough internal accounting that the system can explain where the work went.
The surface works. The harder question is what happens when the agent using it wants to change it—and those changes need to become real without a full restart ritual. That is the next story.