AI Content Moderation Guide

Discuse checks text and images for sentiment/toxicity, spam, unwanted language, bad words, links, and explicit imagery, then returns a per-category result so your code can approve, flag, or reject content automatically. You send content to one endpoint, POST https://api.discuse.com/api/v2/check, and read structured scores back. This guide covers how the checks work, the moderation patterns around them, and how to wire the Discuse API into your pipeline.

What is AI content moderation?

AI content moderation uses machine-learning models to detect and classify potentially harmful content automatically. Where a human reviewer reads one item at a time, these models score content as it arrives, so submissions can be checked before they reach other users.

How does it work?

Submit content: Send text and/or media URLs to the moderation API.
Run checks: The API runs the enabled checks (sentiment, language, spam, bad words, images, links, antivirus).
Score each category: Each check returns scores and a hit flag indicating whether it crossed your configured threshold.
Decide: Read has_violations (and the per-check scores) to approve, flag, or reject.

With Discuse, the request looks like this:

const response = await fetch('https://api.discuse.com/api/v2/check', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': process.env.DISCUSE_API_KEY
  },
  body: JSON.stringify({
    content: { text: 'message to check', image_urls: ['https://...'] },
    settings: { check_sentiment: true, check_spam: true, check_images: true }
  })
});
const result = await response.json(); // { has_violations, results: { sentiment, spamfinder, images, ... }, usage }

What does each Discuse check cover?

Check (`settings` toggle)	What it covers	Key result fields
`check_sentiment`	Negativity, toxicity, profanity, threats, insults in text	`sentiment.is_toxic`, `sentiment.toxicity`, `sentiment.score`, `sentiment.hit`
`check_spam`	Spam classification of text	`spamfinder.label`, `spamfinder.confidence`, `spamfinder.is_spam`, `spamfinder.hit`
`check_language`	Whether text matches the expected language	`language.language`, `language.confidence`, `language.hit`
`check_badwords`	Custom bad-word list matches	`badwords.hit`, `badwords.matched_words`
`check_images`	Explicit imagery in image URLs	`images.porn`, `images.sexual`, `images.neutral`, `images.hit`
`check_links`	Link reputation	`links.status`, `links.hit`
`check_antivirus`	Malware in document/file URLs	`antivirus.status`, `antivirus.hit`

Each toggle is a boolean. The numeric thresholds that turn a score into a hit are configured per project in the dashboard, not passed per request — see the threshold configuration guide.

Benefits of AI moderation

Scale

AI processes content volumes that human teams cannot. A single API call returns results in milliseconds, so moderation keeps pace with submissions instead of falling behind a review queue. Pair automated checks with a human queue for borderline cases (covered below).

Speed

Real-time checks let you screen content before it publishes:

// Pre-moderation: Check content before publishing
async function publishPost(content) {
  const moderation = await checkContent(content);

  if (moderation.has_violations) {
    return { published: false, reason: moderation.message };
  }

  // Content passes moderation
  return await saveAndPublish(content);
}

Consistency

AI applies the same rules uniformly across all content, with no fatigue and no variation between reviewers. Decisions are reproducible: the same input with the same project thresholds yields the same hit flags, which makes enforcement auditable.

Moderation architecture

Pre-moderation flow

User Submits → AI Check → Decision
                 ↓
    ┌───────────┼───────────┐
    ↓           ↓           ↓
  Allow      Review       Block
    ↓           ↓           ↓
 Publish   Human Queue   Reject

Post-moderation flow

User Submits → Publish → AI Check → Action
                           ↓
              ┌────────────┼────────────┐
              ↓            ↓            ↓
            Safe       Borderline    Violation
              ↓            ↓            ↓
           Keep       Flag/Review    Remove

Hybrid approach (recommended)

The Discuse response sets has_violations once any enabled check crosses its configured threshold, and exposes the underlying per-category scores so you can add your own confidence band on top:

async function moderateContent(content) {
  const result = await checkContent(content);

  // Build a confidence figure from the scores you care about.
  // e.g. the toxicity score and the spam classifier confidence.
  const confidence = Math.max(
    result.results?.sentiment?.toxicity ?? 0,
    result.results?.spamfinder?.confidence ?? 0,
    result.results?.images?.porn ?? 0
  );

  // High confidence: automate.
  if (confidence > 0.95) {
    return result.has_violations
      ? { action: 'auto_remove', reason: result.message }
      : { action: 'auto_approve' };
  }

  // Medium confidence: route to a human.
  if (confidence > 0.5) {
    await addToReviewQueue(content, result);
    return { action: 'pending_review' };
  }

  // Low confidence: approve, keep watching.
  return { action: 'approve_with_monitoring' };
}

Implementing AI moderation

Step 1: Define your policy

Decide what content is acceptable and how to act on each category before you call the API. In Discuse, the numeric cut-off for each category lives in your project settings (the dashboard), so your application policy maps an API result to an action rather than re-deciding the threshold:

const MODERATION_POLICY = {
  // What to do when a given Discuse check reports a hit.
  // Thresholds themselves are configured per project in the dashboard.
  actions: {
    sentiment: 'block',   // toxic / threatening text
    spam: 'block',
    badwords: 'flag',
    images: 'block',      // explicit imagery
    links: 'flag'
  }
};

Step 2: Integrate the API

Send content and the checks you want enabled for this request. Each check_* toggle is an optional boolean that overrides the project default for this call:

async function checkContent(content) {
  const response = await fetch('https://api.discuse.com/api/v2/check', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': process.env.DISCUSE_API_KEY
    },
    body: JSON.stringify({
      content: {
        text: content.text,
        image_urls: content.images
      },
      settings: {
        check_sentiment: true,
        check_spam: true,
        check_images: true
      }
    })
  });

  if (!response.ok) {
    throw new Error(`Discuse API returned ${response.status}`);
  }
  return response.json();
}

Step 3: Apply decisions

Map each check that reports a hit to the action you defined:

function applyModerationDecision(result) {
  const r = result.results || {};

  if (r.sentiment?.hit) return { action: MODERATION_POLICY.actions.sentiment, category: 'sentiment' };
  if (r.spamfinder?.hit) return { action: MODERATION_POLICY.actions.spam, category: 'spam' };
  if (r.images?.hit)    return { action: MODERATION_POLICY.actions.images, category: 'images' };
  if (r.badwords?.hit)  return { action: MODERATION_POLICY.actions.badwords, category: 'badwords' };
  if (r.links?.hit)     return { action: MODERATION_POLICY.actions.links, category: 'links' };

  return { action: 'allow' };
}

Step 4: Handle edge cases

async function handleModerationResult(content, result) {
  switch (result.action) {
    case 'block':
      await notifyUser(content.author, 'content_blocked', result);
      await logModeration(content, result);
      return false;

    case 'flag':
      await addToReviewQueue(content, result);
      await publishWithWarning(content);
      return true;

    case 'allow':
      await publish(content);
      return true;

    default:
      // Unknown action - fail safe by blocking
      await logError('unknown_moderation_action', result);
      return false;
  }
}

Best practices

Start conservative, adjust over time

Begin with stricter project thresholds and loosen them as you measure false positives. In Discuse these thresholds are project settings, so tuning happens in the dashboard (or via the settings update API), not in each request. See the threshold configuration guide for the workflow.

Keep a human review queue

AI should augment, not replace, human judgment for edge cases:

async function processReviewQueue() {
  const items = await getReviewQueue();

  for (const item of items) {
    // Present to human reviewer with AI context
    const reviewUI = {
      content: item.content,
      ai_scores: item.moderation_result,
      similar_decisions: await getSimilarPreviousDecisions(item)
    };

    // Human makes final decision
    const decision = await presentToReviewer(reviewUI);

    // Log for model improvement
    await logHumanDecision(item, decision);
  }
}

Monitor and improve

Track key metrics to improve your moderation system:

const METRICS = {
  // Accuracy metrics
  false_positive_rate: 'Content incorrectly blocked',
  false_negative_rate: 'Harmful content missed',

  // Operational metrics
  average_response_time: 'API latency',
  review_queue_depth: 'Human review backlog',

  // User impact
  appeal_rate: 'Users appealing decisions',
  appeal_success_rate: 'Appeals overturned'
};

Handle appeals gracefully

When a user appeals, route the item to a human reviewer rather than re-deciding it automatically. Give the reviewer the original Discuse scores and the user's history as context:

async function handleAppeal(contentId, userId) {
  const original = await getContentWithModeration(contentId);

  await addToReviewQueue(contentId, {
    type: 'appeal',
    original_decision: original.moderation, // Discuse `results` saved at decision time
    author_history: await getAuthorHistory(userId)
  });

  return { status: 'pending', message: 'Under review' };
}

The API itself has no per-request "context" or "author history" parameter — context is something you apply on your side when choosing thresholds and routing for review.

Common pitfalls

Over-relying on AI

Automating every decision means automating every mistake. Keep a human in the loop for:

Complex contextual decisions
High-stakes content (legal, safety)
Appeals and edge cases

Ignoring context

The same words can be harmful or acceptable depending on context:

"I'm going to kill it at this interview!" // Positive
"I'm going to kill you"                    // Threat

Discuse scores each message on its own; it has no request-level "context" parameter. Apply context on your side: pick stricter or looser project thresholds per surface (public post vs. direct message), and route borderline hits to human review.

Set-and-forget

Content moderation requires ongoing tuning:

Monitor false positive/negative rates
Update thresholds based on data
Review new content patterns
Retrain or update models

Inconsistent enforcement

Apply policy by rule, not by who posted it. Drive thresholds from a documented trust level rather than ad-hoc exceptions:

// Avoid: per-person exceptions
if (user.isInfluencer) { /* lenient */ }

// Prefer: thresholds keyed to a documented trust level,
// configured the same way for everyone in that level.
const action = MODERATION_POLICY.actions[category];

Next steps

Configuring Thresholds - Fine-tune your moderation
Scaling Content Moderation - Handle high volumes
Text Analysis - Deep dive into text moderation
Image NSFW Detection - Visual content protection