Building a Slack Agent with Pi on Vercel

I built a Slack bot. His name is Junior. He lives in Sentry’s Slack workspace and he’s not a generic assistant — he knows what Sentry does, where our offices are, what we ship, and what we care about. He does the stuff that used to require someone to context-switch out of a conversation: searching the web, drafting documents, generating images, running shell commands, and executing task-specific workflows called skills. The stuff that somebody eventually does but nobody wants to stop what they’re doing to go do.

The stack: Chat SDK for event handling, Pi as the agent framework, AI Gateway for model routing, Sandboxes for isolated code execution. All on Vercel. The interesting parts aren’t the framework choices — they’re the Slack-native tools, knowing when to shut up, and a skills system where new capabilities are markdown files, not code.

#Hiding Slack’s Mess

Look, Slack’s API is a mess. Between the Events API, Socket Mode, the Assistants API, and the various message formats, there’s a lot of surface area to get wrong. I don’t want to deal with any of that in my agent code. Chat SDK abstracts it away with an adapter pattern:

import { Chat } from "chat";
import { createSlackAdapter } from "@chat-adapter/slack";
import { createStateAdapter } from "./state";

export const bot = new Chat({
  userName: "junior",
  adapters: {
    slack: createSlackAdapter(),
  },
  state: createStateAdapter(),
});

bot.onNewMention(handleNewMention);
bot.onSubscribedMessage(handleSubscribedMessage);
bot.onAssistantThreadStarted(handleAssistantThreadStarted);
bot.onAssistantContextChanged(handleAssistantContextChanged);

Four event handlers cover everything:

onNewMention - someone @-mentions Junior in a channel or DM. Always triggers a response.
onSubscribedMessage - a new message in a thread Junior is already in. Not every message warrants a response though (more on that later).
onAssistantThreadStarted - fired when a user opens a new thread with Junior. We use it to set the title and suggested prompts.
onAssistantContextChanged - context updates within an assistant thread.

The state adapter backs conversation into Redis. Every thread gets its own message history and artifact references, so Junior remembers what happened without re-reading the entire Slack history every time.

One nice thing from Slack’s Assistants API: live status updates. When the agent is working, you see “Searching web for pricing data…” or “Reading file…” in the thread, updated in real-time as each tool runs. Not generic “thinking…” messages:

function formatToolStatusWithInput(toolName: string, input: unknown): string {
  const obj =
    input && typeof input === "object"
      ? (input as Record<string, unknown>)
      : undefined;
  const path = obj ? compactPathForStatus(obj.path) : undefined;
  const query = obj ? compactTextForStatus(obj.query, 70) : undefined;
  const domain = obj ? extractDomainForStatus(obj.url) : undefined;
  const skillName = obj
    ? compactTextForStatus(obj.skill_name ?? obj.skillName, 40)
    : undefined;

  if (path && toolName === "readFile") return `Reading file ${path}`;
  if (path && toolName === "writeFile") return `Writing file ${path}`;
  if (skillName && toolName === "loadSkill")
    return `Loading skill ${skillName}`;
  if (query && toolName === "webSearch") return `Searching web for "${query}"`;
  if (domain && toolName === "webFetch") return `Fetching page from ${domain}`;

  return formatToolStatus(toolName); // fallback: "Running shell command in sandbox", etc.
}

The updates are debounced at one per second (STATUS_UPDATE_DEBOUNCE_MS = 1000) because rapid tool calls would otherwise hammer Slack’s API. Without it you’d get rate-limited or flood the UI with status flicker. Small thing, but watching the agent work through five steps in sequence is night and day versus staring at a typing indicator for 30 seconds wondering if it’s dead.

The point: this layer decouples agent logic from Slack’s quirks entirely. A message arrives, gets normalized into a platform-agnostic shape, and the agent never knows it came from Slack. Want Discord tomorrow? Write a new adapter, not a new agent.

#The Agent Loop

Every event handler funnels into one function: generateAssistantReply. This is where the agent actually does its thing.

async function generateAssistantReply(
  messageText: string,
  context: ReplyRequestContext,
): Promise<AssistantReply> {
  // 1. Discover available skills
  const availableSkills = await discoverSkills();

  // 2. Create a sandboxed execution environment
  const sandboxExecutor = createSandboxExecutor({
    sandboxId: context.sandbox?.sandboxId,
  });
  sandboxExecutor.configureSkills(availableSkills);
  const sandbox = await sandboxExecutor.createSandbox();

  // 3. Build the system prompt
  const systemPrompt = buildSystemPrompt({
    availableSkills,
    activeSkills: [],
    invocation: parseSkillInvocation(messageText),
    assistant: context.assistant,
    requester: context.requester,
  });

  // 4. Create tools
  const tools = createTools(availableSkills, {
    onGeneratedFiles: (files) => generatedFiles.push(...files),
    onArtifactStatePatch: (patch) => Object.assign(artifactStatePatch, patch),
    onSkillLoaded: (skill) => activeSkills.push(skill),
  });

  // 5. Create and run the agent
  const agent = new Agent({
    initialState: {
      systemPrompt,
      model: resolveGatewayModel(botConfig.modelId),
      tools: createAgentTools(tools, skillSandbox, spanContext),
    },
  });

  await agent.prompt({
    role: "user",
    content: userContentParts,
    timestamp: Date.now(),
  });

  // 6. Extract the response
  // ...
}

This is heavily simplified. The real thing handles thread locking, context management, attachment resolution, timeout enforcement, execution escape detection, and a bunch of observability. But you get the idea.

Two of those deserve more detail. The model sometimes tries to bail — “want me to proceed?”, “let me know if you’d like me to continue”, “tag me again when you’re ready.” Regex-based detection catches these and treats them as execution failures. You asked it to do something — do it or fail honestly. Don’t ask for permission you already have. Same pattern catches raw tool payload dumps — sometimes the model spits out JSON tool calls as text instead of actually calling the tool. Both get replaced with a generic failure message and the turn retries.

There’s also a 15-minute hard timeout wrapping the entire agent loop. Promise.race between the agent execution and a timer, with agent.abort() called before rejecting. You need this because the agent decides when to stop, and sometimes “when” is “never.”

The system prompt isn’t a static string — it’s assembled per-request from a base personality (SOUL.md), identity context, available skills, output formatting rules, and tool usage guidelines. Different requester, different skills loaded, different prompt.

agent.prompt() is where Pi takes over. You give it a user message, it decides what to do — call tools, reason about results, call more tools, produce a final response. You don’t micromanage. The agent decides when it’s done.

#One Endpoint for Everything

The AI Gateway is more interesting than it sounds. On the surface it’s just configuration sitting between you and the model providers — but it’s the reason adding new capabilities to Junior is trivial.

export const botConfig = {
  userName: "junior",
  modelId: process.env.AI_MODEL ?? "anthropic/claude-sonnet-4.6",
  routerModelId:
    process.env.AI_ROUTER_MODEL ??
    process.env.AI_MODEL ??
    "anthropic/claude-sonnet-4.6",
};

Model IDs use a provider/model format. Different models for different jobs: the primary agent gets a capable model, the routing classifier gets a cheap fast one. Swapping Claude for GPT is an env var change. Your agent code never touches API keys or provider-specific formats.

import { resolveGatewayModel } from "./pi/client";

const model = resolveGatewayModel(botConfig.modelId);

The gateway resolves the model, handles auth via VERCEL_OIDC_TOKEN, and routes the request. But what makes it genuinely useful: the same endpoint, same auth, same request shape handles both text and image generation. Adding image generation to Junior looked like this:

export function createImageGenerateTool(hooks: ToolHooks) {
  return tool({
    description: "Generate an image from a prompt.",
    inputSchema: Type.Object({
      prompt: Type.String({ minLength: 1, maxLength: 4000 }),
    }),
    execute: async ({ prompt }) => {
      const model = process.env.AI_IMAGE_MODEL ?? "google/gemini-3-pro-image";
      const response = await fetch(
        "https://ai-gateway.vercel.sh/v1/chat/completions",
        {
          method: "POST",
          headers: {
            "content-type": "application/json",
            authorization: `Bearer ${apiKey}`,
          },
          body: JSON.stringify({
            model,
            messages: [{ role: "user", content: prompt }],
            modalities: ["image"],
          }),
        },
      );

      // Parse image from response, attach to Slack message
      const images = payload.choices?.[0]?.message?.images ?? [];
      hooks.onGeneratedFiles?.(uploads);
      return { ok: true, model, image_count: uploads.length };
    },
  });
}

Same gateway, same auth, same request shape. The model does text or images depending on what you ask for. Swapping Gemini for DALL-E is an env var change. Adding image generation to Junior was maybe 30 minutes of work, most of it wiring the generated images back into Slack message attachments.

Web search works the same way — vercel/parallel-search as a provider-defined tool through the same gateway endpoint, same auth, same pattern. The agent doesn’t know or care that search, image generation, and text completion are different capabilities. They’re all just gateway calls.

The gateway client also has completeObject — takes a Zod schema, returns validated structured data. This powers the message routing classifier (more on that below). It has a three-tier JSON extraction fallback: direct parse, fenced code block extraction, then brace-matching. Works across any model the gateway supports, which matters because cheap models are all over the place with how they format JSON.

#What It Can Actually Do

Every tool follows the same shape: TypeBox schema for input validation, a description for the model, an execute function.

import { tool } from "./tools/definition";
import { Type } from "@sinclair/typebox";

export interface ToolDefinition<TInputSchema extends TSchema = TSchema> {
  description: string;
  inputSchema: TInputSchema;
  execute?: (
    input: Static<TInputSchema>,
    options: ToolCallOptions,
  ) => Promise<unknown> | unknown;
}

Junior has 14 tools:

loadSkill — activate a skill for the current task
systemTime — current time and timezone
bash — shell commands in the sandbox
readFile / writeFile — sandbox filesystem access
webSearch — search the public web via the gateway
webFetch — fetch and parse a specific URL
imageGenerate — generate images via the gateway
slackCanvasCreate / slackCanvasUpdate — long-form Slack canvases
slackListCreate / slackListAddItems / slackListGetItems / slackListUpdateItem — Slack Lists for tracking

The first eight are standard agent stuff — every agent has web search, file I/O, shell access. The Slack-native ones are what make Junior actually useful in Slack. Canvases give it somewhere to put long-form output that isn’t a wall of text in a thread. Lists let it create structured tracking — action items, project status — as real Slack List objects that people can edit after the fact.

Most agent demos stop at “search the web and write files.” Here’s the canvas tool:

export function createSlackCanvasCreateTool(
  context: ToolRuntimeContext,
  state: ToolState,
) {
  return tool({
    description:
      "Create a Slack canvas for long-form output in the current channel.",
    inputSchema: Type.Object({
      title: Type.String({
        minLength: 1,
        maxLength: 160,
        description: "Canvas title.",
      }),
      markdown: Type.String({
        minLength: 1,
        description: "Canvas markdown body content.",
      }),
      channel_id: Type.Optional(
        Type.String({ description: "Optional Slack channel ID." }),
      ),
    }),
    execute: async ({ title, markdown, channel_id }) => {
      const targetChannelId = channel_id ?? context.channelId;

      // Idempotency check — don't create duplicate canvases
      const operationKey = createOperationKey("slackCanvasCreate", {
        title,
        markdown,
        channel_id: targetChannelId,
      });
      const cached = state.getOperationResult(operationKey);
      if (cached) return { ...cached, deduplicated: true };

      const created = await createCanvas({
        title,
        markdown,
        channelId: targetChannelId,
      });
      state.patchArtifactState({
        lastCanvasId: created.canvasId,
        lastCanvasUrl: created.permalink,
      });

      return {
        ok: true,
        canvas_id: created.canvasId,
        permalink: created.permalink,
      };
    },
  });
}

Idempotency checking so it doesn’t create duplicate canvases on retries. Artifact state so it can reference the canvas in follow-up turns. Runtime context for the current channel.

Every tool looks like this. Define the schema, write the execute function, register it. The agent discovers tools automatically and decides when to use them based on the description. Adding a new tool is adding a new file — no routing logic, no dispatch table, no ceremony.

One layer on top worth mentioning: tool access control. When a skill is active, SkillSandbox.filterToolNames() intersects the available tools against that skill’s allowedTools list. A documentation skill can’t touch bash. An image generation skill can’t create canvases. Tools not on the list get silently dropped. Skills are markdown files, but they still have permissions.

#Don’t Let It Run Code on Your Server

You can’t let an LLM run arbitrary code on your server. Shouldn’t have to say this, but I’ve seen enough “just shell out to bash” implementations to know better.

@vercel/sandbox gives you isolated Node.js environments with their own filesystem, process space, and network. Junior gets three tools that run inside it: bash, readFile, and writeFile.

import { Sandbox } from "@vercel/sandbox";

const sandbox = await Sandbox.create({
  timeout: 1000 * 60 * 30, // 30 minutes
  runtime: "node22",
});

// Sync skill files into the sandbox workspace
const filesToWrite = await buildSkillSyncFiles(availableSkills);
const directories = collectDirectories(filesToWrite);

for (const dir of directories) {
  await sandbox.mkDir(dir);
}
await sandbox.writeFiles(filesToWrite);

// Create tool executors backed by the sandbox
const toolkit = await createBashTool({
  sandbox,
  destination: "/vercel/sandbox",
});
// toolkit.tools.bash, toolkit.tools.readFile, toolkit.tools.writeFile

Sandbox reuse is handled through thread state in Redis. The sandbox ID gets persisted, and next turn calls Sandbox.get({ sandboxId }) to restore it. Falls back silently to creating a new one if it’s expired. Files the agent writes in turn 1 are still there in turn 5. Skill files get synced in at /vercel/sandbox/skills/{skill-name}/ every time a sandbox is acquired.

Each tool execution extends the sandbox timeout via extendTimeout(), configurable through VERCEL_SANDBOX_KEEPALIVE_MS. Best-effort — if the keepalive fails, execution continues. You don’t want a keepalive failure to kill a tool call that was otherwise fine.

Skill files are capped at 256KB / 20K characters when read into the sandbox. If your skill file is that big, it’s doing too much.

The tradeoff is latency. A few seconds to spin up a sandbox, plus network overhead on every tool execution. For a Slack bot that’s fine — people expect a few seconds. For a real-time chat UI you’d want to think harder about this.

#Capabilities Are Markdown Files

Skills are markdown files. That’s it. A directory with a SKILL.md file — YAML frontmatter and a markdown body. Here’s /sum — when someone types /sum in a thread, Junior summarizes it:

---
name: sum
description: >
  Summarize the current Slack thread into a concise brief
  with actions. Use when users invoke /sum.
---

## Workflow

1. Treat the current thread context as the primary source of truth
2. Identify URLs from the thread and current message
3. Select only relevant URLs; fetch at most 5 with `webFetch`
4. Build the response: Summary, Action items, Open questions
   (only if present), Sources used

## Output rules

- Summary: 5-8 bullets focused on decisions, status, and risks
- Action items: `Owner: <name|unassigned> | Action: <task> | Due: <date|none>`
- Sources: list fetched URLs, or "Thread context only" if none
- Never invent facts. Say "uncertain" instead of guessing.

That’s the whole skill. No code. The agent reads this, knows what to do, and has guardrails for how to do it. Anyone on the team can write one of these.

Skills live in the source code alongside everything else. The agent sees them in the system prompt with their name and description, and calls loadSkill when a task matches. Just the standard skills spec, nothing custom. Discovery results are cached for 5 seconds (SKILL_CACHE_TTL_MS) to avoid filesystem hammering when rapid-fire messages hit the same thread.

The frontmatter can also include allowed-tools — a whitelist of tool names the skill can use. A skill for generating images has no business touching bash.

export interface SkillMetadata {
  name: string;
  description: string;
  skillPath: string;
  allowedTools?: string[];
}

This is the part I’m most excited about. Adding a new capability to Junior is writing a markdown file. Not TypeScript, not a new tool, not a deploy. A markdown file. That changes who can contribute from “people who can write TypeScript” to “people who can write.”

#Knowing When to Shut Up

An agent that replies to everything in a thread is annoying. Two humans having a side conversation and Junior chimes in on every message? People will mute it. And then you’ve built something nobody uses.

When a message arrives in a subscribed thread, it goes through a classifier first:

const replyDecisionSchema = z.object({
  should_reply: z.boolean(),
  confidence: z.number().min(0).max(1),
  reason: z.string().max(160).optional(),
});

const ROUTER_CONFIDENCE_THRESHOLD = 0.72;

async function shouldReplyInSubscribedThread(args: {
  rawText: string;
  text: string;
  conversationContext?: string;
  isExplicitMention?: boolean;
}): Promise<{ shouldReply: boolean; reason: string }> {
  // Explicit mentions always get a reply
  if (args.isExplicitMention) {
    return { shouldReply: true, reason: "explicit mention" };
  }

  // Ask a cheap, fast model: should I respond to this?
  const result = await completeObject({
    modelId: botConfig.routerModelId,
    schema: replyDecisionSchema,
    maxTokens: 120,
    temperature: 0,
    system: routerSystemPrompt,
    prompt: args.rawText,
  });

  if (!result.object.should_reply)
    return { shouldReply: false, reason: result.object.reason };
  if (result.object.confidence < ROUTER_CONFIDENCE_THRESHOLD) {
    return {
      shouldReply: false,
      reason: `low confidence (${result.object.confidence})`,
    };
  }

  return { shouldReply: true, reason: result.object.reason };
}

Small cheap model, structured output, confidence threshold. Below 0.72, Junior stays quiet. Explicit @-mentions skip the classifier entirely.

The classifier doesn’t just see the raw message — it gets thread context and knows the assistant’s name. It’s deciding “is this directed at Junior?” not “does this need a response?” That distinction matters. “Can someone look into this?” in a thread where Junior has been helping? Probably directed at Junior. Same message in a thread where two engineers are debugging? Probably not.

Nobody notices when the bot correctly doesn’t respond. But they definitely notice when it incorrectly does.

#So

The part I’m most excited about is the skills system — capabilities as markdown files is a genuinely powerful idea, and it’s still early. Part of the goal was making the bot hackable: anyone at Sentry can use our /skill-creator to extend or refine what Junior can do, no TypeScript required. I wrote more about skills and Warden in Skill Synthesis. We’re considering open sourcing Junior if there’s demand — hopefully this was a useful look at how it all fits together.