The Review Agent

The Review Agent is the core intelligence component of the Gova platform. It functions as an automated expert moderator, utilizing Large Language Models (LLMs) to evaluate real-time chat messages, assess their risk levels, and determine if an intervention is required based on specific community guidelines.

Overview

The agent acts as a "Reviewer" that mimics a moderator with extensive experience. It does not just look for banned keywords; it understands the nuance, intent, and context of conversations. Every evaluation results in a Severity Score and a justified Reasoning, ensuring that automated actions are transparent and auditable.

Evaluation Logic

The Review Agent processes four primary data points to make an informed decision:

Server Context: A high-level summary of the community (e.g., "A technical support server for Python developers").
Community Guidelines: The specific rules the agent is tasked to enforce (e.g., "No self-promotion," "Be respectful").
Channel History: A summary of recent messages to understand the flow of the current conversation.
Message Metadata: The specific message content, sender information, and platform-specific data.

Severity Scoring

Every message is assigned a score between 0.0 and 1.0:

0.0 - 0.3: Compliant; no action needed.
0.4 - 0.7: Potential violation; may require monitoring or a soft warning.
0.8 - 1.0: Critical violation; suggests immediate escalation (e.g., kick or timeout).

Agent Output

When the agent evaluates a message, it returns a structured ReviewAgentOutput object.

Suggested Actions

If the agent determines a violation has occurred, it may suggest an action based on the capabilities granted to the moderator. Currently supported actions for Discord include:

REPLY: Sends a public warning or clarification to the user.
TIMEOUT: Temporarily restricts the user's ability to send messages.
KICK: Removes the user from the server.

Configuration and Usage

The agent's behavior is governed by the conf (configuration) provided when creating or updating a Moderator. This configuration defines the "personality" and the "ruleset" the agent follows.

Example Evaluation Result

In the background, the agent produces a JSON-structured response that the backend uses to trigger alerts or automated flows:

{
  "severity_score": 0.85,
  "reason": "The user utilized a targeted slur against another member, directly violating the 'Zero Tolerance for Hate Speech' guideline.",
  "action": {
    "type": "TIMEOUT",
    "params": {
      "duration_minutes": 60,
      "reason": "Hate speech violation detected by AI moderator."
    }
  }
}

Approval Workflow

To ensure safety, the Gova backend provides an escalation path for the Review Agent's suggestions:

Detection: The agent flags a message and suggests an action.
Pending State: The action is saved with a status of AWAITING_APPROVAL.
Human-in-the-loop: A human administrator reviews the reason and severity_score via the API or Dashboard.
Execution: The admin calls the /actions/{action_id}/approve endpoint to execute the intervention on the live platform.

Action Execution Endpoint

POST /api/v1/actions/{action_id}/approve

Response: Returns an ActionResponse indicating the final status (COMPLETED or FAILED) and the execution timestamp.