Chapter 5: Specialist Agents
In Chapter 4: Gemini LLM Interaction (with Throttling & Retries), we explored how our system safely and reliably communicates with the powerful Gemini AI. Now, it's time to meet the actual "workers" who use this communication channel to perform their tasks: the Specialist Agents.
Remember from Chapter 1: Agent Orchestrator that our AgentOrchestrator acts like a project manager. But a project manager needs a team of experts to delegate tasks to!
What Problem Do Specialist Agents Solve?
Imagine you have a huge, complicated insurance policy document. It contains all sorts of information: who can get the policy (eligibility), what benefits it offers, how much it costs, what happens if payments are missed, and so on. If you had to extract all of this information manually, it would be a very long and error-prone job for one person.
Now, imagine giving all these complex tasks to one single AI model. It would be like asking one person to be an expert in law, medicine, finance, and customer service all at once! This would likely lead to:
- Overwhelm: Too much information for one AI to process accurately.
- Lower Accuracy: The AI might make mistakes trying to cover too many topics.
- Difficulty in Management: If something goes wrong, it's hard to pinpoint why.
The problem is: How do we efficiently and accurately extract many different types of structured information from a complex document using AI?
The Solution: A Team of Experts
Our primepolicy-ai-main project solves this by creating a team of Specialist Agents. Think of these agents as a group of highly trained experts, where each one has a specific area of focus within an insurance policy.
For example:
- One agent is an expert on "Eligibility Rules."
- Another is an expert on "Benefits."
- A third specializes in "Pricing Structure."
Each agent focuses only on its assigned domain. This approach brings several benefits:
- Specialization: Each agent becomes very good at extracting specific details, leading to higher accuracy.
- Efficiency: Agents only look for what's relevant to their domain, making the process faster.
- Modularity: If a new type of information needs to be extracted, we just add a new specialist agent without affecting others.
- Clarity: It's easier to understand and debug if an agent focuses on a clear, defined task.
Key Concepts of Specialist Agents
All specialist agents in our project follow a common blueprint to ensure they work together smoothly.
1. BaseAgent: The Common Framework
Every specialist agent in our system inherits from a BaseAgent class. This BaseAgent provides the fundamental tools and methods that all agents need to do their job. It's like giving all your team members a standard laptop, internet access, and a handbook on how to find information.
The BaseAgent provides two critical helper methods:
getContext: This method is how an agent queries the Vector Store (Supabase + pgvector) for relevant document sections. This process is known as Retrieval-Augmented Generation (RAG). Instead of trying to "guess" the answer, the agent first "retrieves" the most relevant parts of the policy document.extract: Once the agent has the relevant document sections, this method interacts with the Gemini LLM Interaction (with Throttling & Retries) to actually extract the specific information. It ensures the extracted data follows a predefined structure, often based on our Definitive PAS Schema.
2. Independent Work, Orchestrated Combination
Each specialist agent works independently on its specific task. They don't directly talk to each other. Instead, their findings are later collected and combined by the Agent Orchestrator into one complete and structured JSON report.
How to Use Specialist Agents
As a user or developer interacting with primepolicy-ai-main, you primarily interact with the Agent Orchestrator. The Orchestrator is responsible for managing and running these specialist agents. You don't directly call an individual EligibilityAgent, for example.
Let's revisit how the AgentOrchestrator uses these agents (from Chapter 1):
// lib/agents/orchestrator.ts (Simplified)
import { IdentityAgent } from "./identity-agent";
import { EligibilityAgent } from "./eligibility-agent";
// ... other agent imports ...
export class AgentOrchestrator {
private agents: BaseAgent[] = [];
constructor() {
// The orchestrator knows which expert agents it needs
this.agents = [
new IdentityAgent(), // Finds policy identification details
new EligibilityAgent(), // Checks who can be covered by the policy
// ... many other specialist agents are initialized here ...
];
}
public async executeExtraction(documentId: string): Promise<any> {
// ...
for (let i = 0; i < this.agents.length; i += clusterSize) {
const cluster = this.agents.slice(i, i + clusterSize);
// Each agent in the cluster works on its task in parallel
const clusterResults = await Promise.all(cluster.map(agent => agent.run(documentId)));
// ... collects results ...
}
// ... consolidates data ...
}
}
What's Happening Above?
- Team Assembly: In the
AgentOrchestrator'sconstructor, it creates instances of all the specialist agents (e.g.,new IdentityAgent(),new EligibilityAgent()). This is like gathering the team before a project starts. - Task Assignment: When
orchestrator.executeExtraction()is called, it iterates through these agents, calling each agent'srun()method. It even groups them into "clusters" to manage the workload efficiently.
Example Input (Conceptual):
The Orchestrator tells an agent, "Here's the document ID (my_policy.pdf), go extract your specific information."
Example Output (Conceptual):
Each specialist agent returns its findings in a structured way. For example, an EligibilityAgent might return:
{
"agentName": "Eligibility Agent",
"data": {
"eligibility": {
"entry_age_min": 18,
"entry_age_max": 65,
"maturity_age_max": 80,
"minimum_sum_assured": 100000,
"maximum_sum_assured": 5000000,
"policy_term_min_years": 5,
"policy_term_max_years": 40
}
},
"status": "success"
}
Notice how this agent only focused on "eligibility" details, not benefits or pricing. The Orchestrator will then merge this with findings from other agents.
Behind the Scenes: How Specialist Agents Work Internally
Let's look at the common structure and workflow of a specialist agent.
Step-by-Step Flow for a Single Specialist Agent
Loading diagram...
Code Walkthrough: lib/agents/base.ts (The Blueprint)
This file defines the BaseAgent class, which all our specialist agents extend.
// lib/agents/base.ts (Simplified)
import { generateStructuredOutput } from "../gemini";
import { searchDocumentSections } from "../vector-store";
import { logger } from "../utils";
export interface AgentResponse {
agentName: string;
data: any;
status: "success" | "error";
message?: string;
}
export abstract class BaseAgent {
public abstract name: string;
public abstract description: string;
public abstract run(documentId: string): Promise<AgentResponse>;
protected async getContext(documentId: string, query: string, limit = 5) {
logger.debug(`[${this.name}] Getting context for: ${query}...`);
const results = await searchDocumentSections(query, limit, 0.5, documentId);
return results.map(r => r.content).join("\n\n---\n\n");
}
protected async extract<T>(context: string, prompt: string, schema: string): Promise<T> {
const fullPrompt = `CONTEXT: ${context}\n\nTASK: ${prompt}`;
logger.debug(`[${this.name}] Starting extraction for: ${prompt.substring(0, 50)}...`);
const result = await generateStructuredOutput<T>(fullPrompt, schema);
logger.debug(`[${this.name}] Extraction complete`);
return result;
}
}
Explanation:
AgentResponse: This defines the standard format in which every agent must return its results (agent name, data, status, and an optional message).abstract class BaseAgent: This is the blueprint.nameanddescriptionare properties every agent must have.run(documentId: string)is an abstract method, meaning every agent must implement its own specificrunlogic. This is where the agent's unique expertise comes into play.
getContextMethod: This is a common helper that all agents use.- It takes a
documentId(the name of the policy document), and aquery(what information the agent is looking for). - It calls
searchDocumentSectionsfrom our Vector Store to find the most relevant document chunks based on thequery. - It then combines these relevant chunks into a single
contextstring, which is then passed to the LLM.
- It takes a
extractMethod: This is another common helper.- It takes the
context(the relevant chunks found bygetContext), aprompt(the specific instruction for the LLM), and aschema(the JSON structure the LLM should follow). - It calls
generateStructuredOutputfrom our Gemini LLM Interaction module. This sends the prompt and schema to Gemini and gets back a nicely structured JSON response.
- It takes the
Code Walkthrough: An Example Specialist Agent (lib/agents/eligibility-agent.ts)
Now, let's see how a specific agent, the EligibilityAgent, puts the BaseAgent framework to use.
// lib/agents/eligibility-agent.ts
import { BaseAgent, AgentResponse } from "./base";
import { DEFINITIVE_PAS_SCHEMA } from "./schema"; // Our master schema
export class EligibilityAgent extends BaseAgent {
public name = "Eligibility Agent";
public description = "Owner of eligibility domain.";
public async run(documentId: string): Promise<AgentResponse> {
try {
// 1. Get relevant context for eligibility rules
const context = await this.getContext(
documentId,
"Extract entry age bounds, maturity age bounds, financial sum assured limits, and policy term constraints."
);
// 2. Define the specific prompt for Gemini
const prompt = `Extract entry ages, maturity ages, financial limits (SA bounds), and policy term constraints.
- entry_age_min: minimum entry age.
// ... (other fields omitted for brevity) ...
Ensure all numerical outputs are pure numbers. Use null if not available. Strictly follow the DEFINITIVE_PAS_SCHEMA.eligibility structure.`;
// 3. Define the expected output schema (part of our master schema)
const schema = JSON.stringify({ eligibility: DEFINITIVE_PAS_SCHEMA.eligibility }, null, 2);
// 4. Use the base agent's extract method to get structured data
const data = await this.extract<any>(context, prompt, schema);
return {
agentName: this.name,
data,
status: "success",
};
} catch (error: any) {
return { agentName: this.name, data: null, status: "error", message: error.message };
}
}
}
Explanation:
- Extends
BaseAgent:EligibilityAgentbuilds uponBaseAgent, automatically gettinggetContextandextractmethods. nameanddescription: It provides its unique name and description.runMethod: This is where the magic happens for this specific agent:- It calls
this.getContext()with aqueryspecifically crafted to find eligibility information (e.g., "entry age bounds," "maturity age"). - It then prepares a detailed
promptfor Gemini, telling it exactly what to look for within the retrievedcontext. - It specifies a
schemausingDEFINITIVE_PAS_SCHEMA.eligibility. This tells Gemini how to format the extracted eligibility data (e.g.,entry_age_minshould be a number). We'll learn more about this schema in Chapter 6: Definitive PAS Schema. - Finally, it calls
this.extract()to send all this to Gemini and get back the structured eligibility data. - It returns the data in the standard
AgentResponseformat.
- It calls
This pattern is repeated for every specialist agent, whether it's the PricingAgent, BenefitLogicAgent, or ExclusionAgent. Each one is an expert in its little corner of the policy document!
Our Team of Specialist Agents
Here are some of the specialist agents currently on our team:
| Agent Name | Description | Specialization Domain |
|---|---|---|
| Identity Agent | Extracts product name, family, variant, jurisdiction, currency. | product_context |
| Eligibility Agent | Focuses on entry ages, maturity ages, sum assured limits. | eligibility |
| Benefit Logic Agent | Maps benefit types, descriptions, payout formulas. | benefits |
| Pricing Agent | Extracts premium structure, payment frequencies, grace periods. | premium_structure |
| Exclusion Agent | Identifies policy exclusions and their applicable periods. | exclusions |
| Lifecycle Agent | Covers lapse conditions, revival periods, surrender, loans, bonuses. | policy_lifecycle, bonuses |
| Compliance Agent | Extracts tax benefits and regulatory notes. | tax_and_regulatory |
Conclusion
Specialist Agents are the dedicated workforce of our primepolicy-ai-main project. By breaking down the complex task of policy extraction into smaller, domain-specific problems, they bring focus, accuracy, and scalability to our system. Each agent, leveraging the common BaseAgent framework, efficiently retrieves relevant information from the Vector Store and intelligently extracts structured data using the Gemini LLM, all while adhering to a shared output structure.
Now that we understand who the team members are and how they operate, the next logical step is to explore the precise format of the final report they are all contributing to. This is where the master plan, our Definitive PAS Schema, comes into play.
Next Chapter: Definitive PAS Schema