Contextual Agents
Overview
Gova utilizes Contextual Agents to perform nuanced content moderation. Unlike traditional keyword-based filters, these agents leverage Large Language Models (LLMs) to understand the history of a conversation, the specific rules of a community, and the intent behind a message before suggesting or taking action.
The system's primary entity is the Review Agent, which acts as a virtual moderator with specialized context regarding your server's environment.
The Review Agent
The Review Agent is responsible for analyzing incoming messages and generating a ReviewAgentOutput. It evaluates messages not in isolation, but by processing multiple layers of context provided by the backend engine.
Decision Context
To make an informed decision, the agent is supplied with the following contextual data points:
| Field | Description | | :--- | :--- | | Server Summary | A high-level overview of the Discord server's purpose and culture. | | Channel Summary | A rolling summary of the most recent conversations within the specific channel. | | Server Guidelines | The specific rules and boundaries defined by the server owner. | | Message Metadata | The content of the message, the author, and relevant timestamps. | | Action Definitions | A list of available moderation tools (e.g., Kick, Timeout) the agent is permitted to use. |
Output Schema
When the agent completes an evaluation, it returns a structured response used by the API to log events or trigger escalations.
class ReviewAgentOutput(BaseModel):
severity_score: float # A value between 0.00 and 1.00
reason: str # An explanation of why the score was given
action: Action | None # An optional suggested action (Reply, Timeout, Kick)
- Severity Score: A score of
0.0indicates full compliance with guidelines, while1.0indicates a critical violation. - Reasoning: The agent provides a natural language justification, which is visible to human moderators in the Gova dashboard when reviewing flagged content.
Contextual Summarization
The power of the Gova backend lies in its ability to condense long chat histories into "Summaries." This prevents the LLM from becoming overwhelmed by raw data while ensuring it retains the "vibe" of the conversation.
Channel Summaries
As messages flow through the system, the backend maintains a stateful summary of the channel. This allows the Review Agent to detect:
- Escalation: A conversation turning from a friendly debate into a toxic argument.
- Contextual Sarcasm: Messages that might look benign in isolation but are offensive given the preceding 10 messages.
- Community Trends: Identifying if multiple users are suddenly violating a specific guideline.
Integration with Moderation Actions
Once an agent identifies a violation, it can suggest a specific action based on the DiscordActionType. These actions are initially set to AWAITING_APPROVAL status in the database, allowing human moderators to verify the agent's decision before execution.
Example: Agent-Triggered Action
If a user violates a "No Spam" guideline, the agent might generate the following context for the ActionEvents table:
{
"action_type": "TIMEOUT",
"action_params": {
"duration": 3600,
"reason": "Repeatedly posting promotional links despite verbal warnings in the channel history."
},
"severity_score": 0.85
}
Execution via API
Moderators can then interact with the /actions/{action_id}/approve endpoint to execute the agent's suggested context on the live platform (e.g., Discord).
POST /api/v1/actions/550e8400-e29b-41d4-a716-446655440000/approve
Authorization: Bearer <JWT>
The system retrieves the DiscordMessageContext stored by the agent and passes it to the platform handler to perform the timeout, kick, or reply.