AI features are powerful when applied correctly. They're expensive failures when applied incorrectly. After integrating AI into numerous production systems, here are the patterns that work—and the mistakes to avoid.
Start with the question: should this use AI?
Not every feature needs AI. Before reaching for GPT-4, ask:
AI is a good fit when:
- The task requires understanding natural language
- Rules would be too complex or numerous to encode
- Human-like judgment adds value
- Perfect accuracy isn't required (but good accuracy is achievable)
- You can verify or bound the output
AI is a poor fit when:
- Simple rules solve the problem
- Deterministic, reproducible output is required
- Latency must be under 100ms
- Cost per query matters significantly
- The task has clear right/wrong answers
Example: Email categorization
Naive approach: "Let's use GPT-4 to categorize all incoming emails!"
Better approach: Analyze the categories first.
- If there are 5 categories with clear keywords, use rule-based classification
- If there are 50+ fuzzy categories, use a fine-tuned classifier (cheaper than GPT-4)
- If categories are dynamic or context-dependent, consider LLMs
- If you need to extract specific fields reliably, use structured extraction
Often the right answer is a hybrid: simple rules for common cases, AI for edge cases.
Pattern 1: Treat AI as an unreliable service
LLM APIs will fail. They'll be slow. They'll return unexpected outputs. Design for this.
class AIService {
private readonly timeout = 30000; // 30 seconds
private readonly maxRetries = 3;
async complete(prompt: string, options: CompletionOptions): Promise<string> {
let lastError: Error | null = null;
for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
try {
const response = await this.callWithTimeout(
this.openai.chat.completions.create({
model: options.model || "gpt-4",
messages: [{ role: "user", content: prompt }],
max_tokens: options.maxTokens || 1000,
temperature: options.temperature || 0.3,
}),
this.timeout
);
return response.choices[0]?.message?.content || "";
} catch (error) {
lastError = error as Error;
if (this.isRetryable(error)) {
await this.backoff(attempt);
continue;
}
throw error;
}
}
throw lastError;
}
private isRetryable(error: unknown): boolean {
if (error instanceof Error) {
// Retry on rate limits and server errors
return error.message.includes("429") ||
error.message.includes("500") ||
error.message.includes("timeout");
}
return false;
}
private async backoff(attempt: number) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
Always have a fallback:
async function categorizeEmail(email: Email): Promise<Category> {
try {
return await aiService.categorize(email);
} catch (error) {
// Fallback to rule-based categorization
logger.warn("AI categorization failed, using fallback", { error });
return ruleBasedCategorizer.categorize(email);
}
}
Pattern 2: Structured output extraction
LLMs are terrible at following formats consistently. Don't parse free-form text—request structured output.
import { z } from "zod";
// Define expected output schema
const ProductExtractionSchema = z.object({
productName: z.string(),
price: z.number().nullable(),
currency: z.enum(["USD", "CAD", "EUR"]).nullable(),
features: z.array(z.string()),
confidence: z.number().min(0).max(1),
});
type ProductExtraction = z.infer<typeof ProductExtractionSchema>;
async function extractProductInfo(description: string): Promise<ProductExtraction> {
const systemPrompt = `
Extract product information from the given text.
Respond ONLY with valid JSON matching this schema:
{
"productName": "string",
"price": number or null,
"currency": "USD" | "CAD" | "EUR" | null,
"features": ["string"],
"confidence": number between 0 and 1
}
`;
const response = await aiService.complete(
`${systemPrompt}\n\nText: ${description}`,
{ model: "gpt-4", temperature: 0.1 }
);
// Parse and validate
try {
const parsed = JSON.parse(response);
return ProductExtractionSchema.parse(parsed);
} catch (error) {
throw new AIOutputError("Failed to parse structured output", {
response,
error,
});
}
}
For critical applications, use OpenAI's function calling or Anthropic's structured output features—they're more reliable than prompting for JSON.
Pattern 3: Prompt versioning and testing
Prompts are code. Version them, test them, review changes.
// prompts/product-extraction.ts
export const productExtractionPrompt = {
version: "1.2.0",
name: "product-extraction",
system: `You are a product information extractor.
Extract structured data from product descriptions.
Be conservative with confidence scores.`,
template: (description: string) => `
Extract product information from:
"""
${description}
"""
Respond with JSON only.`,
// Test cases for validation
testCases: [
{
input: "Nike Air Max 90, $150 USD, breathable mesh, cushioned sole",
expectedOutput: {
productName: "Nike Air Max 90",
price: 150,
currency: "USD",
features: ["breathable mesh", "cushioned sole"],
},
},
],
};
Run prompt tests in CI:
describe("productExtractionPrompt", () => {
it.each(productExtractionPrompt.testCases)(
"extracts correctly: $input",
async ({ input, expectedOutput }) => {
const result = await extractProductInfo(input);
expect(result.productName).toBe(expectedOutput.productName);
expect(result.price).toBe(expectedOutput.price);
// Allow some flexibility in extracted features
expect(result.features).toEqual(
expect.arrayContaining(expectedOutput.features)
);
}
);
});
Pattern 4: Cost control
AI API costs add up fast. Implement guardrails.
class CostController {
private dailySpend = 0;
private readonly dailyLimit = 100; // $100/day
async trackAndExecute<T>(
operation: () => Promise<T>,
estimatedTokens: number,
model: string
): Promise<T> {
const estimatedCost = this.estimateCost(estimatedTokens, model);
if (this.dailySpend + estimatedCost > this.dailyLimit) {
throw new CostLimitExceeded({
dailySpend: this.dailySpend,
dailyLimit: this.dailyLimit,
estimatedCost,
});
}
const result = await operation();
this.dailySpend += estimatedCost;
// Alert when approaching limit
if (this.dailySpend > this.dailyLimit * 0.8) {
await this.alertOps("AI spend at 80% of daily limit");
}
return result;
}
private estimateCost(tokens: number, model: string): number {
const rates: Record<string, number> = {
"gpt-4": 0.00003, // $0.03 per 1K tokens
"gpt-4-turbo": 0.00001, // $0.01 per 1K tokens
"gpt-3.5-turbo": 0.0000005, // $0.0005 per 1K tokens
};
return tokens * (rates[model] || rates["gpt-4"]);
}
}
Cost optimization strategies:
- Use cheaper models for simple tasks (GPT-3.5 for classification, GPT-4 for reasoning)
- Cache identical requests
- Batch similar requests when possible
- Set token limits appropriate to the task
- Monitor and alert on unusual patterns
Pattern 5: Human-in-the-loop for high-stakes decisions
For decisions that matter, AI should assist humans, not replace them.
interface AIDecision {
decision: string;
confidence: number;
reasoning: string;
requiresReview: boolean;
}
async function processLoanApplication(
application: LoanApplication
): Promise<ProcessingResult> {
const aiAssessment = await aiService.assessLoan(application);
// Auto-approve only high-confidence positive decisions
if (
aiAssessment.decision === "approve" &&
aiAssessment.confidence > 0.95 &&
application.amount < 10000
) {
return {
status: "approved",
method: "automatic",
aiAssessment,
};
}
// Auto-reject obvious cases
if (
aiAssessment.decision === "reject" &&
aiAssessment.confidence > 0.98
) {
return {
status: "rejected",
method: "automatic",
aiAssessment,
};
}
// Everything else goes to human review
await reviewQueue.add({
application,
aiAssessment,
priority: calculatePriority(aiAssessment),
});
return {
status: "pending_review",
method: "human",
aiAssessment,
};
}
The AI handles the easy cases, humans focus on the hard ones.
Pattern 6: RAG for grounded responses
Retrieval-Augmented Generation (RAG) reduces hallucination by grounding responses in your data.
class RAGService {
constructor(
private vectorStore: VectorStore,
private aiService: AIService
) {}
async answer(question: string): Promise<RAGResponse> {
// 1. Embed the question
const questionEmbedding = await this.embed(question);
// 2. Find relevant documents
const relevantDocs = await this.vectorStore.search({
vector: questionEmbedding,
limit: 5,
minScore: 0.7,
});
if (relevantDocs.length === 0) {
return {
answer: "I don't have information about that in my knowledge base.",
sources: [],
confidence: 0,
};
}
// 3. Generate answer with context
const context = relevantDocs
.map((d) => `[Source: ${d.title}]\n${d.content}`)
.join("\n\n");
const answer = await this.aiService.complete(`
Answer the question based ONLY on the provided context.
If the context doesn't contain the answer, say so.
Context:
${context}
Question: ${question}
Answer:
`);
return {
answer,
sources: relevantDocs.map((d) => ({
title: d.title,
url: d.url,
})),
confidence: Math.max(...relevantDocs.map((d) => d.score)),
};
}
}
Red flags in AI feature design
Watch out for these patterns:
1. "Let AI figure it out" If you can't explain what the AI should do, you can't verify if it's working.
2. No fallback plan What happens when the AI service is down? You need an answer.
3. No output validation AI output must be validated before use. Every time.
4. Ignoring edge cases Test with adversarial inputs, not just happy paths.
5. Cost blindness Monitor spend from day one. It's easy to 10x costs accidentally.
When to invest in fine-tuning
Fine-tuning makes sense when:
- You're making thousands of similar requests per day
- Off-the-shelf models need too much prompting for your use case
- You have high-quality training data
- Cost reduction is important (fine-tuned smaller models can replace larger ones)
Fine-tuning doesn't make sense when:
- Your use case is well-handled by prompting
- You don't have quality training data
- Requirements change frequently
- Volume is low
Start with prompting. Move to fine-tuning when you have data and proven use cases.
Planning to add AI features to your product? Let's discuss the right architecture for your use case.