architecture · security deep-dive

What the code does to prevent each threat.

The /security page maps the threat-mitigation contract at a glance. This page traces the contract into the actual code: which file, which lines, what would break if it weren't there. Companion to /architecture/audit-log.

A threat model is only as good as the code that implements it. /security lists 14 threats × 14 mitigations in tabular form. This page picks 8 of the highest-stakes mitigations and walks them line-by-line: which file the code lives in, what it does, and what would mechanically break if it were removed.

For T1, T2, T3, T5, T6, T9, T11, T12, the rest of the 14 follow the same pattern; the source is small enough to read end-to-end in < 1h.

T1 · LLM retries a tool call, double-charges the customer

Agent network blip → SDK retry → MP sees two charge requests. Without idempotency, the buyer gets billed twice. The bug is hard to detect because it looks like 'normal' retry behavior to the agent's logs.

mitigation · packages/mercadopago/src/tools.ts
// packages/mercadopago/src/tools.ts
const idempotencyKey = await sha256Hex(JSON.stringify({
  tool: name,
  inputs: canonical(args),
  customer: args.payerId,
  amount: args.amount,
  currency: args.currency,
}));

const response = await client.payments.create({
  ...args,
  idempotencyKey, // ← MP server-side dedupes on this
});
what would breakIf the idempotency key were random per-call, MP would treat each retry as a fresh charge. The deterministic SHA-256 of the canonical input space is what makes the same logical request produce the same key.
MP docs § idempotencyCookbook R02

T2 · Compromised LLM authorizes refund / cancellation without consent

Prompt injection or jailbroken upstream model decides to refund a payment the user didn't consent to. The agent has the credentials; without a programmatic gate, anything in the system prompt that says 'always confirm before refunding' is just a suggestion the model can ignore.

mitigation · packages/mercadopago/src/middleware.ts
// packages/mercadopago/src/middleware.ts
const HITL_TOOLS = new Set([
  "refund_payment",
  "cancel_subscription",
  "pause_subscription",
  "cancel_payment_preference",
  "delete_customer_card",
  "cancel_qr_dynamic",
  "delete_pos",
  "revoke_marketplace_token",
]);

export function applyConfirmationGate<T>(tools: T, require: ConfirmFn): T {
  // Wraps each gated tool's execute() so it blocks until require() returns true.
  // The wrapper is server-side; the LLM can't bypass it by 'ignoring' anything.
  ...
}
what would breakIf the gate were a system-prompt instruction ('always ask before refunding'), a sufficiently determined jailbreak would bypass it. The programmatic wrapper makes the gate a mechanical contract, the tool literally doesn't execute until the host's UI / Slack / pager confirms.
RFC-001 § 3.2/security T2

T3 · Webhook spoofing forges fake completed payments

Attacker POSTs a hand-crafted MP webhook to your handler, marking a fake payment as 'approved'. Without signature verification, the agent treats it as legitimate and issues the factura.

mitigation · packages/mercadopago/src/webhook.ts
// packages/mercadopago/src/webhook.ts
export async function verifyWebhookSignature(params: {
  requestId: string | null;
  dataId: string;
  signatureHeader: string | null;
  secret: string;
  replayToleranceSeconds?: number;  // default 300
}): Promise<boolean> {
  // Parse "ts=...,v1=..." → HMAC-SHA256(ts.id.dataId, secret) → constant-time compare.
  // 5-min replay window rejects re-played old signed payloads.
  ...
}
what would breakWithout HMAC verification, any anonymous internet caller can POST to your webhook endpoint and forge state. With it, an attacker would need the shared secret (which lives in your env vars, never in the agent's context).
Cookbook R03/security T3-T4

T5 · Access token leaks into client-side JS bundle

A junior dev imports the MP / AFIP client in a React Server Component that accidentally gets pulled into a Client Component graph. Next.js bundles the secret into the JS shipped to every browser. Now anyone who views-source can see your prod credentials.

mitigation · packages/mercadopago/src/client.ts
// packages/mercadopago/src/client.ts
export class MercadoPagoClient {
  constructor(options: MercadoPagoClientOptions) {
    if (typeof window !== "undefined") {
      throw new Error(
        "MercadoPagoClient must not be instantiated in a browser context. " +
        "Use it from Server Components, Route Handlers, or Server Actions only.",
      );
    }
    ...
  }
}
what would breakThe check is at construction time, not runtime. A misconfigured import that pulls the client into a client component fails loud at build (or first SSR) instead of silently leaking the token. Same pattern in @ar-agents/facturacion's WsfeClient.
/security T5

T6 · AFIP cert exfiltration via logs / source maps / cold-start traces

Your prod env has AFIP_CERT_PEM + AFIP_KEY_PEM. Some logging library prints process.env on error. The cert + key end up in datadog / Sentry / cold-start logs. Once visible, anyone can impersonate you to AFIP for 2-3 years (cert lifetime).

mitigation · packages/identity/src/wsaa-wscdc-adapter.ts
// packages/identity/src/wsaa-wscdc-adapter.ts
constructor(options: WsaaWscdcAdapterOptions) {
  const hasPaths = options.certPath && options.keyPath;
  const hasPems = options.certPem && options.keyPem;
  if ((!hasPaths && !hasPems) || !options.cuitRepresentado) {
    throw new AfipNotConfiguredError();
  }
  // Cert + key never round-trip through getters or toJSON().
  // The TokenCache holds them in closures, not as instance fields.
  ...
}
what would breakClosure-private credentials prevent accidental JSON.stringify(client) from including them. Combined with the host-responsibility framework (RFC-001 § 3.2 mandates HSM/KMS for sociedades-IA in prod), the toolkit-side surface is minimal-leakage by design.
RFC-001 § 3.2/security T6

T11 · Attacker who breached the host modifies past audit-log records

The attacker is INSIDE, they have shell on your prod box. They want to cover their tracks by editing past tool calls in the audit log. Without HMAC signing, they just open the KV record + change a field. Nobody notices.

mitigation · apps/landing/src/lib/audit.ts
// apps/landing/src/lib/audit.ts
export async function signEntry(entry: Omit<AuditEntry, "hmac">): Promise<string | null> {
  const key = await getHmacKey();  // server-side secret, not in process.env at runtime
  if (!key) return null;
  const { hmac: _ignored, ...payload } = entry as AuditEntry;
  const sig = await crypto.subtle.sign(
    "HMAC",
    key,
    enc.encode(canonical(payload)), // canonical-JSON-stable input space
  );
  return `sha256:${bytesToHex(sig)}`;
}
what would breakAny edit to a signed entry produces a signature mismatch on /api/play/audit/{id}?verify=1. The attacker would need the AUDIT_HMAC_SECRET to forge a new signature; that secret is separately scoped (different from MP/AFIP/WhatsApp secrets) so breaching one doesn't necessarily breach the audit log.
/architecture/audit-log/verifyRFC-001 § 9.2

T12 · Marketplace seller's MP refresh-token leaked

Your marketplace OAuth flow stores per-seller refresh tokens. If your DB is leaked, every seller's MP account is compromised, an attacker can drain them via your access token credentials. Refresh tokens are long-lived (180 days+), so even a year-old leak is a usable foothold.

mitigation · packages/mercadopago/src/vercel-kv-oauth-store.ts
// packages/mercadopago/src/oauth-store.ts (subpath: @ar-agents/mercadopago/vercel-kv)
export class VercelKVOAuthTokenStore implements OAuthTokenStore {
  // - Encrypted at rest via Upstash (KV TLS + at-rest encryption).
  // - Scoped to one Vercel project; revoke_marketplace_token tool
  //   gated behind requireConfirmation() per T2.
  // - Per-tenant key namespacing prevents one tenant's compromise
  //   from exposing another.
  ...
}
what would breakOperators who roll their own (e.g., plain Postgres with no at-rest encryption + cleartext tokens) re-introduce the threat. The subpath adapter is the safe default; deviating from it is a host-responsibility decision per RFC-001 § 3.1.
RFC-001 § 3.1/security T12

T9 · Hung agent loops until quotas exhaust

Agent gets stuck retrying a 500 from MP. Without a step ceiling or circuit breaker, it loops until the LLM provider's monthly cap blows. Cost surprises destroy trust; in agent commerce, they also break the operator's ability to deliver.

mitigation · apps/landing/src/app/api/play/route.ts
// /api/play/route.ts + cookbook patterns
const result = streamText({
  model: "anthropic/claude-sonnet-4-6",
  ...
  stopWhen: ({ steps }) => steps.length >= 12,  // step ceiling
  providerOptions: {
    anthropic: { maxOutputTokens: 1200 },       // token ceiling
  },
});

// Plus per-API client:
const client = new MercadoPagoClient({
  accessToken: token,
  circuitBreaker: {                              // rolling-window
    failureThreshold: 5,                        //   5 failures in
    failureWindowMs: 60_000,                    //   60s opens the
    resetAfterMs: 30_000,                       //   breaker for 30s
  },
  maxRetries: 1,                                // mutations: 1 retry
});
what would breakWithout the step ceiling, a stuck loop costs 100× more per session. Without the circuit breaker, transient MP outages cascade through retries until rate-limit-detection at the LLM gateway level (much later, much more expensive). The defense is layered.
Cookbook R08/security T9

The non-trivial threats this page does NOT cover

Six remaining threats (T4 replay, T7 supply-chain, T8 typo-squat, T10 cross-tenant, T13 PDF injection, T14 MP fingerprint bypass) are documented at the same depth in /security. T4 (replay) is the closest sibling of T3, same HMAC primitive, +5-min window check. T7 (supply-chain) is covered by SLSA v1 provenance on every npm release. T8 (typo-squat) is covered by owning the entire @ar-agents/* scope. T10 (cross-tenant) is a host-responsibility flag per RFC-001 § 3.1. T13 (PDF injection) is mitigated by static template binding in @ar-agents/facturacion. T14 (MP fingerprint bypass) is explicitly out-of-scope, the toolkit surfaces MP's fraud verdict, it doesn't run the detection.

How to audit this yourself

  1. Clone the repo: git clone https://github.com/ar-agents/ar-agents.
  2. Read the 8 file paths above. Each file is <500 LOC; total ~3000 LOC for the security-critical paths.
  3. Run the test suite: pnpm --filter @ar-agents/identity test (and for each package). Tests cover the negative cases (malformed input rejected, tampering detected, etc).
  4. Run the audit-log primitives test: pnpm --filter ar-agents-landing test. 85 tests including the tamper-detection cases.
  5. Verify provenance on any published package: npm view @ar-agents/mercadopago dist.attestations returns Sigstore transparency log entries tying tarball ↔ commit ↔ runner.
  6. Run the live tamper-demo: curl -X POST https://ar-agents.ar/api/play/tamper-demo returns the original entry verifying + the mutated entry NOT verifying. Mechanical proof, not opinion.

What's intentionally out-of-scope

  • LLM-side prompt-injection robustness. The toolkit ships system-prompt guardrails (refuse jailbreaks, refuse role-play, refuse out-of-scope topics) but doesn't claim to be jailbreak-proof. The programmatic requireConfirmation() gate is the load-bearing piece (T2); system-prompt rules are belt-and-suspenders.
  • DDoS. Vercel's edge handles connection-level DDoS; per-IP rate limiting in /api/play (30/min) handles application-level abuse. Operator-tier DDoS protection is the platform's responsibility, not the toolkit's.
  • Insider threat at the maintainer level. The SLSA v1 attestations + the audit-log primitives don't protect against the maintainer deliberately publishing a backdoored package. They DO make the backdoor mechanically observable, any change to a published tarball requires a commit on the public main branch, which is itself signed + timestamped.

For security researchers

Coordinated disclosure via /.well-known/security.txt: 48-hour response window, GitHub Security Advisory flow. PGP key available on request. Acknowledgments in SECURITY.md.