AI Content Moderation Guide
Discuse checks text and images for sentiment/toxicity, spam, unwanted language, bad words, links, and explicit imagery, then returns a per-category result so your code can approve, flag, or reject content automatically. You send content to one endpoint, POST https://api.discuse.com/api/v2/check, and read structured scores back. This guide covers how the checks work, the moderation patterns around them, and how to wire the Discuse API into your pipeline.
What is AI content moderation?
AI content moderation uses machine-learning models to detect and classify potentially harmful content automatically. Where a human reviewer reads one item at a time, these models score content as it arrives, so submissions can be checked before they reach other users.
How does it work?
- Submit content: Send text and/or media URLs to the moderation API.
- Run checks: The API runs the enabled checks (sentiment, language, spam, bad words, images, links, antivirus).
- Score each category: Each check returns scores and a
hitflag indicating whether it crossed your configured threshold. - Decide: Read
has_violations(and the per-check scores) to approve, flag, or reject.
With Discuse, the request looks like this:
const response = await fetch('https://api.discuse.com/api/v2/check', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': process.env.DISCUSE_API_KEY
},
body: JSON.stringify({
content: { text: 'message to check', image_urls: ['https://...'] },
settings: { check_sentiment: true, check_spam: true, check_images: true }
})
});
const result = await response.json(); // { has_violations, results: { sentiment, spamfinder, images, ... }, usage }
What does each Discuse check cover?
Check (settings toggle) |
What it covers | Key result fields |
|---|---|---|
check_sentiment |
Negativity, toxicity, profanity, threats, insults in text | sentiment.is_toxic, sentiment.toxicity, sentiment.score, sentiment.hit |
check_spam |
Spam classification of text | spamfinder.label, spamfinder.confidence, spamfinder.is_spam, spamfinder.hit |
check_language |
Whether text matches the expected language | language.language, language.confidence, language.hit |
check_badwords |
Custom bad-word list matches | badwords.hit, badwords.matched_words |
check_images |
Explicit imagery in image URLs | images.porn, images.sexual, images.neutral, images.hit |
check_links |
Link reputation | links.status, links.hit |
check_antivirus |
Malware in document/file URLs | antivirus.status, antivirus.hit |
Each toggle is a boolean. The numeric thresholds that turn a score into a hit are configured per project in the dashboard, not passed per request — see the threshold configuration guide.
Benefits of AI moderation
Scale
AI processes content volumes that human teams cannot. A single API call returns results in milliseconds, so moderation keeps pace with submissions instead of falling behind a review queue. Pair automated checks with a human queue for borderline cases (covered below).
Speed
Real-time checks let you screen content before it publishes:
// Pre-moderation: Check content before publishing
async function publishPost(content) {
const moderation = await checkContent(content);
if (moderation.has_violations) {
return { published: false, reason: moderation.message };
}
// Content passes moderation
return await saveAndPublish(content);
}
Consistency
AI applies the same rules uniformly across all content, with no fatigue and no variation between reviewers. Decisions are reproducible: the same input with the same project thresholds yields the same hit flags, which makes enforcement auditable.
Moderation architecture
Pre-moderation flow
User Submits → AI Check → Decision
↓
┌───────────┼───────────┐
↓ ↓ ↓
Allow Review Block
↓ ↓ ↓
Publish Human Queue Reject
Post-moderation flow
User Submits → Publish → AI Check → Action
↓
┌────────────┼────────────┐
↓ ↓ ↓
Safe Borderline Violation
↓ ↓ ↓
Keep Flag/Review Remove
Hybrid approach (recommended)
The Discuse response sets has_violations once any enabled check crosses its configured threshold, and exposes the underlying per-category scores so you can add your own confidence band on top:
async function moderateContent(content) {
const result = await checkContent(content);
// Build a confidence figure from the scores you care about.
// e.g. the toxicity score and the spam classifier confidence.
const confidence = Math.max(
result.results?.sentiment?.toxicity ?? 0,
result.results?.spamfinder?.confidence ?? 0,
result.results?.images?.porn ?? 0
);
// High confidence: automate.
if (confidence > 0.95) {
return result.has_violations
? { action: 'auto_remove', reason: result.message }
: { action: 'auto_approve' };
}
// Medium confidence: route to a human.
if (confidence > 0.5) {
await addToReviewQueue(content, result);
return { action: 'pending_review' };
}
// Low confidence: approve, keep watching.
return { action: 'approve_with_monitoring' };
}
Implementing AI moderation
Step 1: Define your policy
Decide what content is acceptable and how to act on each category before you call the API. In Discuse, the numeric cut-off for each category lives in your project settings (the dashboard), so your application policy maps an API result to an action rather than re-deciding the threshold:
const MODERATION_POLICY = {
// What to do when a given Discuse check reports a hit.
// Thresholds themselves are configured per project in the dashboard.
actions: {
sentiment: 'block', // toxic / threatening text
spam: 'block',
badwords: 'flag',
images: 'block', // explicit imagery
links: 'flag'
}
};
Step 2: Integrate the API
Send content and the checks you want enabled for this request. Each check_* toggle is an optional boolean that overrides the project default for this call:
async function checkContent(content) {
const response = await fetch('https://api.discuse.com/api/v2/check', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': process.env.DISCUSE_API_KEY
},
body: JSON.stringify({
content: {
text: content.text,
image_urls: content.images
},
settings: {
check_sentiment: true,
check_spam: true,
check_images: true
}
})
});
if (!response.ok) {
throw new Error(`Discuse API returned ${response.status}`);
}
return response.json();
}
Step 3: Apply decisions
Map each check that reports a hit to the action you defined:
function applyModerationDecision(result) {
const r = result.results || {};
if (r.sentiment?.hit) return { action: MODERATION_POLICY.actions.sentiment, category: 'sentiment' };
if (r.spamfinder?.hit) return { action: MODERATION_POLICY.actions.spam, category: 'spam' };
if (r.images?.hit) return { action: MODERATION_POLICY.actions.images, category: 'images' };
if (r.badwords?.hit) return { action: MODERATION_POLICY.actions.badwords, category: 'badwords' };
if (r.links?.hit) return { action: MODERATION_POLICY.actions.links, category: 'links' };
return { action: 'allow' };
}
Step 4: Handle edge cases
async function handleModerationResult(content, result) {
switch (result.action) {
case 'block':
await notifyUser(content.author, 'content_blocked', result);
await logModeration(content, result);
return false;
case 'flag':
await addToReviewQueue(content, result);
await publishWithWarning(content);
return true;
case 'allow':
await publish(content);
return true;
default:
// Unknown action - fail safe by blocking
await logError('unknown_moderation_action', result);
return false;
}
}
Best practices
Start conservative, adjust over time
Begin with stricter project thresholds and loosen them as you measure false positives. In Discuse these thresholds are project settings, so tuning happens in the dashboard (or via the settings update API), not in each request. See the threshold configuration guide for the workflow.
Keep a human review queue
AI should augment, not replace, human judgment for edge cases:
async function processReviewQueue() {
const items = await getReviewQueue();
for (const item of items) {
// Present to human reviewer with AI context
const reviewUI = {
content: item.content,
ai_scores: item.moderation_result,
similar_decisions: await getSimilarPreviousDecisions(item)
};
// Human makes final decision
const decision = await presentToReviewer(reviewUI);
// Log for model improvement
await logHumanDecision(item, decision);
}
}
Monitor and improve
Track key metrics to improve your moderation system:
const METRICS = {
// Accuracy metrics
false_positive_rate: 'Content incorrectly blocked',
false_negative_rate: 'Harmful content missed',
// Operational metrics
average_response_time: 'API latency',
review_queue_depth: 'Human review backlog',
// User impact
appeal_rate: 'Users appealing decisions',
appeal_success_rate: 'Appeals overturned'
};
Handle appeals gracefully
When a user appeals, route the item to a human reviewer rather than re-deciding it automatically. Give the reviewer the original Discuse scores and the user's history as context:
async function handleAppeal(contentId, userId) {
const original = await getContentWithModeration(contentId);
await addToReviewQueue(contentId, {
type: 'appeal',
original_decision: original.moderation, // Discuse `results` saved at decision time
author_history: await getAuthorHistory(userId)
});
return { status: 'pending', message: 'Under review' };
}
The API itself has no per-request "context" or "author history" parameter — context is something you apply on your side when choosing thresholds and routing for review.
Common pitfalls
Over-relying on AI
Automating every decision means automating every mistake. Keep a human in the loop for:
- Complex contextual decisions
- High-stakes content (legal, safety)
- Appeals and edge cases
Ignoring context
The same words can be harmful or acceptable depending on context:
"I'm going to kill it at this interview!" // Positive
"I'm going to kill you" // Threat
Discuse scores each message on its own; it has no request-level "context" parameter. Apply context on your side: pick stricter or looser project thresholds per surface (public post vs. direct message), and route borderline hits to human review.
Set-and-forget
Content moderation requires ongoing tuning:
- Monitor false positive/negative rates
- Update thresholds based on data
- Review new content patterns
- Retrain or update models
Inconsistent enforcement
Apply policy by rule, not by who posted it. Drive thresholds from a documented trust level rather than ad-hoc exceptions:
// Avoid: per-person exceptions
if (user.isInfluencer) { /* lenient */ }
// Prefer: thresholds keyed to a documented trust level,
// configured the same way for everyone in that level.
const action = MODERATION_POLICY.actions[category];
Next steps
- Configuring Thresholds - Fine-tune your moderation
- Scaling Content Moderation - Handle high volumes
- Text Analysis - Deep dive into text moderation
- Image NSFW Detection - Visual content protection