AI features are powerful when applied correctly. They're expensive failures when applied incorrectly. After integrating AI into numerous production systems, here are the patterns that work—and the mistakes to avoid.

Start with the question: should this use AI?

Not every feature needs AI. Before reaching for GPT-4, ask:

AI is a good fit when:

The task requires understanding natural language
Rules would be too complex or numerous to encode
Human-like judgment adds value
Perfect accuracy isn't required (but good accuracy is achievable)
You can verify or bound the output

AI is a poor fit when:

Simple rules solve the problem
Deterministic, reproducible output is required
Latency must be under 100ms
Cost per query matters significantly
The task has clear right/wrong answers

Example: Email categorization

Naive approach: "Let's use GPT-4 to categorize all incoming emails!"

Better approach: Analyze the categories first.

If there are 5 categories with clear keywords, use rule-based classification
If there are 50+ fuzzy categories, use a fine-tuned classifier (cheaper than GPT-4)
If categories are dynamic or context-dependent, consider LLMs
If you need to extract specific fields reliably, use structured extraction

Often the right answer is a hybrid: simple rules for common cases, AI for edge cases.

Pattern 1: Treat AI as an unreliable service

LLM APIs will fail. They'll be slow. They'll return unexpected outputs. Design for this.

class AIService {
  private readonly timeout = 30000; // 30 seconds
  private readonly maxRetries = 3;
  
  async complete(prompt: string, options: CompletionOptions): Promise<string> {
    let lastError: Error | null = null;
    
    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        const response = await this.callWithTimeout(
          this.openai.chat.completions.create({
            model: options.model || "gpt-4",
            messages: [{ role: "user", content: prompt }],
            max_tokens: options.maxTokens || 1000,
            temperature: options.temperature || 0.3,
          }),
          this.timeout
        );
        
        return response.choices[0]?.message?.content || "";
      } catch (error) {
        lastError = error as Error;
        
        if (this.isRetryable(error)) {
          await this.backoff(attempt);
          continue;
        }
        
        throw error;
      }
    }
    
    throw lastError;
  }
  
  private isRetryable(error: unknown): boolean {
    if (error instanceof Error) {
      // Retry on rate limits and server errors
      return error.message.includes("429") || 
             error.message.includes("500") ||
             error.message.includes("timeout");
    }
    return false;
  }
  
  private async backoff(attempt: number) {
    const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
    await new Promise((resolve) => setTimeout(resolve, delay));
  }
}

Always have a fallback:

async function categorizeEmail(email: Email): Promise<Category> {
  try {
    return await aiService.categorize(email);
  } catch (error) {
    // Fallback to rule-based categorization
    logger.warn("AI categorization failed, using fallback", { error });
    return ruleBasedCategorizer.categorize(email);
  }
}

Pattern 2: Structured output extraction

LLMs are terrible at following formats consistently. Don't parse free-form text—request structured output.

import { z } from "zod";

// Define expected output schema
const ProductExtractionSchema = z.object({
  productName: z.string(),
  price: z.number().nullable(),
  currency: z.enum(["USD", "CAD", "EUR"]).nullable(),
  features: z.array(z.string()),
  confidence: z.number().min(0).max(1),
});

type ProductExtraction = z.infer<typeof ProductExtractionSchema>;

async function extractProductInfo(description: string): Promise<ProductExtraction> {
  const systemPrompt = `
    Extract product information from the given text.
    Respond ONLY with valid JSON matching this schema:
    {
      "productName": "string",
      "price": number or null,
      "currency": "USD" | "CAD" | "EUR" | null,
      "features": ["string"],
      "confidence": number between 0 and 1
    }
  `;
  
  const response = await aiService.complete(
    `${systemPrompt}\n\nText: ${description}`,
    { model: "gpt-4", temperature: 0.1 }
  );
  
  // Parse and validate
  try {
    const parsed = JSON.parse(response);
    return ProductExtractionSchema.parse(parsed);
  } catch (error) {
    throw new AIOutputError("Failed to parse structured output", {
      response,
      error,
    });
  }
}

For critical applications, use OpenAI's function calling or Anthropic's structured output features—they're more reliable than prompting for JSON.

Pattern 3: Prompt versioning and testing

Prompts are code. Version them, test them, review changes.

// prompts/product-extraction.ts
export const productExtractionPrompt = {
  version: "1.2.0",
  name: "product-extraction",
  
  system: `You are a product information extractor. 
Extract structured data from product descriptions.
Be conservative with confidence scores.`,
  
  template: (description: string) => `
Extract product information from:

"""
${description}
"""

Respond with JSON only.`,
  
  // Test cases for validation
  testCases: [
    {
      input: "Nike Air Max 90, $150 USD, breathable mesh, cushioned sole",
      expectedOutput: {
        productName: "Nike Air Max 90",
        price: 150,
        currency: "USD",
        features: ["breathable mesh", "cushioned sole"],
      },
    },
  ],
};

Run prompt tests in CI:

describe("productExtractionPrompt", () => {
  it.each(productExtractionPrompt.testCases)(
    "extracts correctly: $input",
    async ({ input, expectedOutput }) => {
      const result = await extractProductInfo(input);
      
      expect(result.productName).toBe(expectedOutput.productName);
      expect(result.price).toBe(expectedOutput.price);
      // Allow some flexibility in extracted features
      expect(result.features).toEqual(
        expect.arrayContaining(expectedOutput.features)
      );
    }
  );
});

Pattern 4: Cost control

AI API costs add up fast. Implement guardrails.

class CostController {
  private dailySpend = 0;
  private readonly dailyLimit = 100; // $100/day
  
  async trackAndExecute<T>(
    operation: () => Promise<T>,
    estimatedTokens: number,
    model: string
  ): Promise<T> {
    const estimatedCost = this.estimateCost(estimatedTokens, model);
    
    if (this.dailySpend + estimatedCost > this.dailyLimit) {
      throw new CostLimitExceeded({
        dailySpend: this.dailySpend,
        dailyLimit: this.dailyLimit,
        estimatedCost,
      });
    }
    
    const result = await operation();
    this.dailySpend += estimatedCost;
    
    // Alert when approaching limit
    if (this.dailySpend > this.dailyLimit * 0.8) {
      await this.alertOps("AI spend at 80% of daily limit");
    }
    
    return result;
  }
  
  private estimateCost(tokens: number, model: string): number {
    const rates: Record<string, number> = {
      "gpt-4": 0.00003,     // $0.03 per 1K tokens
      "gpt-4-turbo": 0.00001, // $0.01 per 1K tokens
      "gpt-3.5-turbo": 0.0000005, // $0.0005 per 1K tokens
    };
    return tokens * (rates[model] || rates["gpt-4"]);
  }
}

Cost optimization strategies:

Use cheaper models for simple tasks (GPT-3.5 for classification, GPT-4 for reasoning)
Cache identical requests
Batch similar requests when possible
Set token limits appropriate to the task
Monitor and alert on unusual patterns

Pattern 5: Human-in-the-loop for high-stakes decisions

For decisions that matter, AI should assist humans, not replace them.

interface AIDecision {
  decision: string;
  confidence: number;
  reasoning: string;
  requiresReview: boolean;
}

async function processLoanApplication(
  application: LoanApplication
): Promise<ProcessingResult> {
  const aiAssessment = await aiService.assessLoan(application);
  
  // Auto-approve only high-confidence positive decisions
  if (
    aiAssessment.decision === "approve" &&
    aiAssessment.confidence > 0.95 &&
    application.amount < 10000
  ) {
    return {
      status: "approved",
      method: "automatic",
      aiAssessment,
    };
  }
  
  // Auto-reject obvious cases
  if (
    aiAssessment.decision === "reject" &&
    aiAssessment.confidence > 0.98
  ) {
    return {
      status: "rejected",
      method: "automatic",
      aiAssessment,
    };
  }
  
  // Everything else goes to human review
  await reviewQueue.add({
    application,
    aiAssessment,
    priority: calculatePriority(aiAssessment),
  });
  
  return {
    status: "pending_review",
    method: "human",
    aiAssessment,
  };
}

The AI handles the easy cases, humans focus on the hard ones.

Pattern 6: RAG for grounded responses

Retrieval-Augmented Generation (RAG) reduces hallucination by grounding responses in your data.

class RAGService {
  constructor(
    private vectorStore: VectorStore,
    private aiService: AIService
  ) {}
  
  async answer(question: string): Promise<RAGResponse> {
    // 1. Embed the question
    const questionEmbedding = await this.embed(question);
    
    // 2. Find relevant documents
    const relevantDocs = await this.vectorStore.search({
      vector: questionEmbedding,
      limit: 5,
      minScore: 0.7,
    });
    
    if (relevantDocs.length === 0) {
      return {
        answer: "I don't have information about that in my knowledge base.",
        sources: [],
        confidence: 0,
      };
    }
    
    // 3. Generate answer with context
    const context = relevantDocs
      .map((d) => `[Source: ${d.title}]\n${d.content}`)
      .join("\n\n");
    
    const answer = await this.aiService.complete(`
      Answer the question based ONLY on the provided context.
      If the context doesn't contain the answer, say so.
      
      Context:
      ${context}
      
      Question: ${question}
      
      Answer:
    `);
    
    return {
      answer,
      sources: relevantDocs.map((d) => ({
        title: d.title,
        url: d.url,
      })),
      confidence: Math.max(...relevantDocs.map((d) => d.score)),
    };
  }
}

Red flags in AI feature design

Watch out for these patterns:

1. "Let AI figure it out" If you can't explain what the AI should do, you can't verify if it's working.

2. No fallback plan What happens when the AI service is down? You need an answer.

3. No output validation AI output must be validated before use. Every time.

4. Ignoring edge cases Test with adversarial inputs, not just happy paths.

5. Cost blindness Monitor spend from day one. It's easy to 10x costs accidentally.

When to invest in fine-tuning

Fine-tuning makes sense when:

You're making thousands of similar requests per day
Off-the-shelf models need too much prompting for your use case
You have high-quality training data
Cost reduction is important (fine-tuned smaller models can replace larger ones)

Fine-tuning doesn't make sense when:

Your use case is well-handled by prompting
You don't have quality training data
Requirements change frequently
Volume is low

Start with prompting. Move to fine-tuning when you have data and proven use cases.

Planning to add AI features to your product? Let's discuss the right architecture for your use case.

Adding AI features safely: architecture patterns