Contextual Agents

Overview

Gova utilizes Contextual Agents to perform nuanced content moderation. Unlike traditional keyword-based filters, these agents leverage Large Language Models (LLMs) to understand the history of a conversation, the specific rules of a community, and the intent behind a message before suggesting or taking action.

The system's primary entity is the Review Agent, which acts as a virtual moderator with specialized context regarding your server's environment.

The Review Agent

The Review Agent is responsible for analyzing incoming messages and generating a ReviewAgentOutput. It evaluates messages not in isolation, but by processing multiple layers of context provided by the backend engine.

Decision Context

To make an informed decision, the agent is supplied with the following contextual data points:

Output Schema

When the agent completes an evaluation, it returns a structured response used by the API to log events or trigger escalations.

class ReviewAgentOutput(BaseModel):
    severity_score: float  # A value between 0.00 and 1.00
    reason: str            # An explanation of why the score was given
    action: Action | None  # An optional suggested action (Reply, Timeout, Kick)

Severity Score: A score of 0.0 indicates full compliance with guidelines, while 1.0 indicates a critical violation.
Reasoning: The agent provides a natural language justification, which is visible to human moderators in the Gova dashboard when reviewing flagged content.

Contextual Summarization

The power of the Gova backend lies in its ability to condense long chat histories into "Summaries." This prevents the LLM from becoming overwhelmed by raw data while ensuring it retains the "vibe" of the conversation.

Channel Summaries

As messages flow through the system, the backend maintains a stateful summary of the channel. This allows the Review Agent to detect:

Escalation: A conversation turning from a friendly debate into a toxic argument.
Contextual Sarcasm: Messages that might look benign in isolation but are offensive given the preceding 10 messages.
Community Trends: Identifying if multiple users are suddenly violating a specific guideline.

Integration with Moderation Actions

Once an agent identifies a violation, it can suggest a specific action based on the DiscordActionType. These actions are initially set to AWAITING_APPROVAL status in the database, allowing human moderators to verify the agent's decision before execution.

Example: Agent-Triggered Action

If a user violates a "No Spam" guideline, the agent might generate the following context for the ActionEvents table:

{
  "action_type": "TIMEOUT",
  "action_params": {
    "duration": 3600,
    "reason": "Repeatedly posting promotional links despite verbal warnings in the channel history."
  },
  "severity_score": 0.85
}

Execution via API

Moderators can then interact with the /actions/{action_id}/approve endpoint to execute the agent's suggested context on the live platform (e.g., Discord).

POST /api/v1/actions/550e8400-e29b-41d4-a716-446655440000/approve
Authorization: Bearer <JWT>

The system retrieves the DiscordMessageContext stored by the agent and passes it to the platform handler to perform the timeout, kick, or reply.