The official Model Context Protocol TypeScript SDK gives you a working server in about forty lines. That is a fine demo. It is not a production server.
This post is the gap between the demo and the thing that an enterprise customer puts in front of Claude Code with real data. I have shipped half a dozen MCP servers through Wealthior Labs over the last twelve months. The patterns below are what survived the second customer.
If you have not built an MCP server yet, the official quickstart is still the right starting point. Read that first, then come back.
What an MCP server actually is
A small JSON-RPC server that exposes three primitives:
- Resources. Read-only data the model can pull.
- Tools. Callable functions with typed inputs and outputs.
- Prompts. Parameterised templates the user can invoke.
That is the whole protocol. Everything else is your problem. Auth, schemas, observability, retries, error semantics, transport, deployment - none of it is in the spec.
This is liberating and dangerous. Liberating because you can ship in a weekend. Dangerous because every server I have seen in the wild ships without auth, without rate limits, without structured logging, and prints stack traces back to the model. Claude will happily put those stack traces in front of your user.
The pieces that go wrong first
I will lead with the failure modes because they shape the design.
Tool schemas drift from the implementation. You write a Zod schema. You change the implementation. The schema does not change. Six months later the tool returns a shape Claude cannot use and nobody knows why.
Error messages leak. A SQL exception, a stack trace, a stringified Error - Claude reads it, summarises it, sometimes ignores the structured response next to it. Treat tool errors like API responses to an untrusted client.
Auth lives outside the protocol. MCP transports do not standardise auth. STDIO has none, SSE has whatever you bolt on, the new Streamable HTTP transport assumes you handle it yourself. Most servers I audit are wide open inside an organisation.
Cost surprises. Tools that fan out to LLM calls, external APIs, or large queries can blow up cost or latency budgets when Claude calls them in a loop. The protocol does not stop this.
No observability. When something goes wrong with a tool call, the user sees "the tool errored" and the developer sees nothing. No structured logs by default, no per-call latency, no input shape capture.
Each of these has a clean fix. Together they are the gap.
A production layout that works
Here is the skeleton I reach for. TypeScript strict, Node 22, official SDK, Zod for schemas.
// src/server.ts
import { Server } from "@modelcontextprotocol/sdk/server/index.js"
import { StreamableHttpServerTransport } from "@modelcontextprotocol/sdk/server/streamable-http.js"
import { z } from "zod"
import { registerSearchTool } from "./tools/search.js"
import { registerCreateTool } from "./tools/create.js"
import { withAuth } from "./middleware/auth.js"
import { withTelemetry } from "./middleware/telemetry.js"
const server = new Server(
{ name: "acme-internal-mcp", version: "1.2.0" },
{ capabilities: { tools: {}, resources: {} } }
)
registerSearchTool(server)
registerCreateTool(server)
const transport = new StreamableHttpServerTransport({
sessionIdGenerator: () => crypto.randomUUID(),
})
await server.connect(withTelemetry(withAuth(transport)))
Three load-bearing decisions hide in those lines.
One tool, one file, one schema
Every tool lives in its own file with its Zod input schema, output schema, and pure handler. No business logic in server.ts. This forces three properties:
- Schemas cannot drift because the handler imports the same schema for runtime validation.
- Each tool is independently testable with Vitest.
- Diff reviews show exactly which tool changed.
// src/tools/search.ts
import type { Server } from "@modelcontextprotocol/sdk/server/index.js"
import { z } from "zod"
const Input = z.object({
query: z.string().min(1).max(200),
limit: z.number().int().min(1).max(50).default(10),
})
const Output = z.object({
results: z.array(
z.object({ id: z.string(), title: z.string(), snippet: z.string() })
),
total: z.number().int().nonnegative(),
})
export function registerSearchTool(server: Server) {
server.tool(
"search_kb",
"Full-text search across the customer knowledge base.",
Input.shape,
async (raw) => {
const args = Input.parse(raw)
const data = await runQuery(args)
const validated = Output.parse(data)
return { content: [{ type: "text", text: JSON.stringify(validated) }] }
}
)
}
Output.parse(data) at the end is the unfashionable hero. It catches the case where your DB query starts returning a new column or a NULL field. The model never sees the broken response because your own validator throws first.
Auth is middleware, not a tool concern
Wrap the transport, not each tool. STDIO mode trusts the local process. HTTP mode requires a bearer token or signed header. The pattern looks like this:
// src/middleware/auth.ts
import type { Transport } from "@modelcontextprotocol/sdk/shared/transport.js"
export function withAuth<T extends Transport>(transport: T): T {
const original = transport.onmessage
transport.onmessage = (message, extra) => {
const token = extra?.requestInfo?.headers?.["authorization"]
if (!isValidToken(token)) {
throw new AuthError("missing or invalid token")
}
return original?.(message, extra)
}
return transport
}
For multi-tenant servers, the token resolves to a tenant id which goes into a getTenant() accessor that every tool reads. Tools never look at headers directly.
Errors are part of the contract
Throwing a raw Error inside a tool handler is a leak. Instead define a small set of typed errors and a sanitiser at the edge.
// src/lib/errors.ts
export class ToolError extends Error {
constructor(
public code: "invalid_input" | "not_found" | "rate_limited" | "internal",
message: string,
public details?: Record<string, unknown>
) {
super(message)
}
}
export function toToolResponse(err: unknown) {
if (err instanceof ToolError) {
return {
isError: true,
content: [{ type: "text", text: JSON.stringify({ code: err.code, message: err.message }) }],
}
}
// Never leak the underlying error to the model.
return {
isError: true,
content: [{ type: "text", text: JSON.stringify({ code: "internal", message: "tool failed" }) }],
}
}
Every tool handler wraps its body in try/catch and routes through toToolResponse. The model gets a clean, structured error it can reason about. Your logger gets the full stack trace privately.
Observability that you actually look at
A tool call without telemetry is a black box. I add three things to every server.
Structured per-call logs. One log line per tool invocation with tool name, tenant id, input hash, duration, outcome, and an opaque request id. JSON for production, pretty for dev.
Per-tool counters. Calls per minute per tool per tenant. Exported in Prometheus format on /metrics for self-hosted, or shipped to Vercel's analytics endpoint for the hosted version. Without this you do not know which tool the model is hammering when something looks weird.
Slow-call traces. Anything over 2 seconds gets a structured span saved to Postgres. After two weeks of production data, this surfaces every N+1 query and every accidental fan-out.
The middleware wrapper is twenty lines and pays for itself in the first incident.
Rate limits and cost control
Claude can call a tool in a loop. Without rate limits one prompt can run up a four-figure database bill or an OpenAI token bill if your tool wraps another LLM.
Two layers I always add:
- Per-call timeout. Default 10 seconds. Beyond that the tool returns
ToolError("rate_limited")and the model has to choose a different path. - Per-tenant per-minute call budget. Sliding window in Redis, tunable per tier. Free customers get 60 calls per minute. Paid customers get 600. Internal admin gets 6000.
Neither is in the protocol. Both are non-optional for production.
Deployment shape
For internal-only servers, run as a long-lived process behind your reverse proxy. STDIO mode works for Claude Code on developer laptops. SSE or Streamable HTTP for everything else.
For multi-tenant SaaS, deploy as a Vercel Function or Cloud Run service with the Streamable HTTP transport. Sessions are stateless, scaled horizontally, and authenticated by bearer token. Postgres holds tenant data, Redis holds rate limit counters.
One pitfall: do not co-locate the MCP server with the rest of your app's API. The traffic shape is different. Claude calls tools in bursts during model turns and goes silent in between. Keeping it on its own deployment lets you scale independently and isolate the blast radius if Claude does something weird.
Testing strategy
Three layers of tests carry the codebase.
Schema tests. Each tool's Input and Output schemas have round-trip tests. A regression here is a contract break.
Handler tests. Vitest with an in-memory or testcontainers Postgres. Every tool gets at least one happy-path, one validation error, one auth failure, one downstream failure case.
Protocol tests. A small script that spins up the server in stdio mode and runs a scripted Claude session against it. Catches transport bugs and prompt-template breakage.
CI runs all three on every PR. The schema tests are free, the handler tests run in seconds, the protocol test takes about a minute.
What I would build differently next time
A few things I would change about my early servers, in case it saves you a week.
Version your tool names. search_kb_v2 is uglier than search_kb but lets you ship a breaking schema without lying to existing clients. The protocol does not version tools for you.
Embed the tool description budget into the spec. Claude reads every tool description on every call. Long descriptions are not free. Keep them under 200 tokens each and link to longer docs.
Treat the MCP server as a backend API, not a config file. It has a SLA, an error contract, a versioned schema, a release process. Most failures I have seen come from teams treating MCP servers as "just a wrapper" and forgetting the backend hygiene.
When to skip MCP entirely
MCP is the right answer when you want any Claude-compatible client to use your tools. Claude Code, Claude Desktop, third-party agents, automation flows.
It is the wrong answer when you only need to plug your tools into one bespoke agent that you control end to end. There, a direct tool-use loop with the Anthropic SDK is simpler, faster, and avoids the protocol's overhead.
The honest decision tree: do at least two clients need to use this tool? If yes, MCP. If no, direct SDK.
Resources
If you want a custom MCP server built or audited, Wealthior Labs ships them as Sprint or Build packages. The AI Engineer Switzerland page has the price ranges and packages.