Real AI problems, scoped for serious builders.

A public library of sanitized AI workflow problems from startups and operators, turned into buildable briefs with a stated reliability bar.

Submit a Problem Explore Problems

Submitting or building a problem does not guarantee mentorship, review, publication, referrals, funding, or hiring outcomes.

Featured brief

Growth Runtime: An AI Growth Agent for Apps with No Users

Build an AI growth agent that helps an early-stage founder discover, reach, convert, and activate the right first users with evidence, not guessing.

Growth · Advanced · Vertical Agent · Reliability R3

evidence-based targeting
human approval before outreach
activation over acquisition

Read the full brief → Build this brief

What this is

A library of real AI workflow problems scoped to be buildable, testable, and useful as portfolio-grade artifacts. Each brief includes the pain, the workflow, the inputs and outputs, evaluation ideas, the boundaries, and a reliability target. The goal is to help builders work on production-shaped problems instead of random demos.

What this is not

Not a bootcamp
Not a job board
Not a recruiting agency
Not an accelerator
Not a free consulting marketplace
Not a promise that companies review submissions

This month's failure brief

The failure behind this brief

What happened

Starbucks terminated its AI inventory counting program across all North American stores nine months after a chain-wide launch. The image-recognition system, built by NomadGo, frequently miscounted and mislabeled stock, confusing milk types or failing to detect items entirely, and stores reverted to manual counting. An internal memo reviewed by Reuters read: Effective immediately, Automated Counting will be discontinued.

Root cause read

A vision model that performed under controlled conditions was deployed at scale into thousands of variable retail environments, where lighting, placement, lookalike packaging, and human workflow did not match the training distribution. Instead of catching shortages, the system added a new layer of inaccuracy to the supply chain it was meant to fix.

Engineering lesson

Eval-production skew at chain scale: the gap between accuracy in a demo environment and consistency under real-world variance was never measured before full deployment. A staged rollout with a shadow-mode comparison against manual ground truth would have surfaced the gap at ten stores instead of all of them.

Sources Source 1

Editorial note: Reuters reporting and the company statement are the primary sources; attach the Reuters link in the editorial verification pass before publish.

This month's build: close the gap that caused it.

Retail Featured

Count the Milk: A Reality-Gap Eval Harness for Vision Inventory Systems

Build an evaluation harness that measures how a vision-based inventory counter degrades from controlled to messy real-world conditions, and gates deployment on consistency under variance, not demo accuracy.

Intermediate Eval R3 Draft

Reliability focus eval-production skew, variance testing, deployment gating

View Brief

Open briefs in this failure family

Legal Open

Cite or Strike: A Citation Verifier for Legal Drafting

Build a verifier that checks every citation in an AI-drafted legal document against a real source and flags or strikes anything it cannot ground.

Intermediate Verifier R3 Draft

Reliability focus citation grounding, fabrication detection, export gating

View Brief

B2B SaaS Open

Triage First: A Support Ticket Routing Agent

Build an agent that classifies inbound support tickets, drafts a suggested reply, escalates uncertain cases, and explains why it routed each one.

Starter Agentic Workflow R2 Draft

Reliability focus classification, calibrated escalation, explainability

View Brief

Seen this failure in your own stack? Submit your version.

Submit your version

The problem library

Enterprise SaaS Open

Answer the Buyer: A Sales Engineer Copilot

Build a retrieval copilot that answers technical buyer questions from product, security, and integration documents, with citations and honest gaps.

Intermediate RAG Copilot R2 Draft

Reliability focus retrieval, citation, calibrated refusal

View Brief

Legal Open

Cite or Strike: A Citation Verifier for Legal Drafting

Build a verifier that checks every citation in an AI-drafted legal document against a real source and flags or strikes anything it cannot ground.

Intermediate Verifier R3 Draft

Reliability focus citation grounding, fabrication detection, export gating

View Brief

Productivity Open

From Notes to Owners: A Meeting-to-Action Operator

Build an agent that turns raw meeting notes into owners, action items, risks, and follow-ups, grounded in what was actually said.

Starter Agentic Workflow R2 Draft

Reliability focus extraction, attribution, grounded refusal

View Brief

Internal Tools Open

One Source of Truth: A Shared Context Layer Across Tools

Build a context layer that reconciles facts scattered across code, chat, docs, CRM, and product tools into one owned, cited, conflict-aware source of truth.

Advanced Knowledge Agent R3 Draft

Reliability focus source reconciliation, conflict detection, ownership and freshness

View Brief

AI Infrastructure Open

Right-Size Every Call: An AI Cost and Latency Router

Build middleware that routes each request between small and large models by predicted complexity, measuring the tradeoff across cost, latency, and quality.

Advanced Optimization Middleware R3 Draft

Reliability focus routing policy, quality floor, measurement

View Brief

E-commerce Open

Routing the Mess: A Merchant Operations Copilot

Build a merchant copilot that routes messy requests across catalog, sales, operations, and support while preventing unsafe mutations.

Intermediate Vertical Agent R3 Draft

Reliability focus routing, safe actions, layered evals

View Brief

Operations Open

Stop the Cascade: A Circuit Breaker for Chained AI Actions

Build a guardrail layer that detects when chained AI actions start compounding errors and halts the chain before a cascade causes real damage.

Advanced Guardrail R3 Draft

Reliability focus circuit breaking, invariant checks, safe halt

View Brief

B2B SaaS Open

Triage First: A Support Ticket Routing Agent

Build an agent that classifies inbound support tickets, drafts a suggested reply, escalates uncertain cases, and explains why it routed each one.

Starter Agentic Workflow R2 Draft

Reliability focus classification, calibrated escalation, explainability

View Brief

How problems are selected

A problem is accepted when it passes all seven questions.

Can one solo builder ship it in 20 to 30 focused hours?
Does it have a clear input and a clear output?
Is there a measurable evaluation target?
Can it run on public, synthetic, or sanitized data?
Does it map to a real buyer or operator pain?
Can its failure be meaningfully analyzed?
Can the teardown teach the community?

Problems requiring confidential data, production credentials, regulated workflows, vague business goals, or heavy integration work are not accepted.

Have a real AI workflow problem? Submit it.

A good problem is specific, painful, buildable with sample or synthetic data, and evaluable without production access. Selected problems may be rewritten into public briefs, and submitting creates no contractor, delivery, or hiring relationship.

Submit a Problem

Want to build one of these?

Builders use Field Briefs to create portfolio-grade artifacts. Some problems may later become cohort capstones or teardown candidates.

Join the builder interest list