The Review Agent
The Review Agent is the core intelligence component of the Gova platform. It functions as an automated expert moderator, utilizing Large Language Models (LLMs) to evaluate real-time chat messages, assess their risk levels, and determine if an intervention is required based on specific community guidelines.
Overview
The agent acts as a "Reviewer" that mimics a moderator with extensive experience. It does not just look for banned keywords; it understands the nuance, intent, and context of conversations. Every evaluation results in a Severity Score and a justified Reasoning, ensuring that automated actions are transparent and auditable.
Evaluation Logic
The Review Agent processes four primary data points to make an informed decision:
- Server Context: A high-level summary of the community (e.g., "A technical support server for Python developers").
- Community Guidelines: The specific rules the agent is tasked to enforce (e.g., "No self-promotion," "Be respectful").
- Channel History: A summary of recent messages to understand the flow of the current conversation.
- Message Metadata: The specific message content, sender information, and platform-specific data.
Severity Scoring
Every message is assigned a score between 0.0 and 1.0:
- 0.0 - 0.3: Compliant; no action needed.
- 0.4 - 0.7: Potential violation; may require monitoring or a soft warning.
- 0.8 - 1.0: Critical violation; suggests immediate escalation (e.g., kick or timeout).
Agent Output
When the agent evaluates a message, it returns a structured ReviewAgentOutput object.
| Field | Type | Description |
| :--- | :--- | :--- |
| severity_score | float | A value from 0.0 to 1.0 indicating the violation strength. |
| reason | string | A detailed explanation of why the score was given. |
| action | object | (Optional) A suggested intervention to be executed on the platform. |
Suggested Actions
If the agent determines a violation has occurred, it may suggest an action based on the capabilities granted to the moderator. Currently supported actions for Discord include:
- REPLY: Sends a public warning or clarification to the user.
- TIMEOUT: Temporarily restricts the user's ability to send messages.
- KICK: Removes the user from the server.
Configuration and Usage
The agent's behavior is governed by the conf (configuration) provided when creating or updating a Moderator. This configuration defines the "personality" and the "ruleset" the agent follows.
Example Evaluation Result
In the background, the agent produces a JSON-structured response that the backend uses to trigger alerts or automated flows:
{
"severity_score": 0.85,
"reason": "The user utilized a targeted slur against another member, directly violating the 'Zero Tolerance for Hate Speech' guideline.",
"action": {
"type": "TIMEOUT",
"params": {
"duration_minutes": 60,
"reason": "Hate speech violation detected by AI moderator."
}
}
}
Approval Workflow
To ensure safety, the Gova backend provides an escalation path for the Review Agent's suggestions:
- Detection: The agent flags a message and suggests an action.
- Pending State: The action is saved with a status of
AWAITING_APPROVAL. - Human-in-the-loop: A human administrator reviews the
reasonandseverity_scorevia the API or Dashboard. - Execution: The admin calls the
/actions/{action_id}/approveendpoint to execute the intervention on the live platform.
Action Execution Endpoint
POST /api/v1/actions/{action_id}/approve
Response: Returns an ActionResponse indicating the final status (COMPLETED or FAILED) and the execution timestamp.