What is Content Moderation?
Content moderation is how a platform decides, at scale, which user-submitted messages, images, and files reach other users — and which get blocked, flagged, or held for a human to review. The tension is always the same: too lax and spam, harassment, and illegal content drive real users away; too strict and you bury legitimate posts under false positives. Every platform with user-generated content lives somewhere on that spectrum, whether it moderates deliberately or by accident.
Discuse handles the detection half of that decision. One API call returns a per-category breakdown — spam, toxicity, NSFW, malware, and language — each with a confidence score, so your own code owns the thresholds and the actions. You keep control of policy; the API does the classification.
What you're actually deciding
A moderation system answers three questions for every piece of content:
- Is this harmful, and how? Not a yes/no — a score per category, because one message can be mild spam and clearly toxic at the same time.
- How confident is the model? A 0.98 toxicity score and a 0.55 are very different decisions. Discuse returns the confidence so you can auto-action the clear cases and route the ambiguous middle to a human.
- What do you do about it? Block, shadow-flag, queue for review, or allow. That is policy, and it stays in your hands.
The categories Discuse classifies:
| Category | What it catches | Example |
|---|---|---|
| Spam | Unsolicited promotion, scams, link farms | "🎁 You won! Claim at bit.ly/…" |
| Toxicity | Harassment, hate speech, threats | Targeted slurs, doxxing, threats of violence |
| NSFW | Adult or graphic imagery | Nudity, pornography, gore |
| Malware | Malicious files and links | Infected attachments, phishing URLs |
| Language | The language a message is written in | Routing, locale rules, expected-language checks |
Pre-moderation vs. post-moderation
The first real choice is when you check.
Pre-moderation — check before the content is visible. Nothing harmful is ever exposed, but every post waits on an API round-trip, so it fits surfaces where a short delay is acceptable (a text check is fast; an image or file scan takes longer because the URL has to be fetched and analyzed). Use it for the high-risk cases: first posts from brand-new accounts, DMs to strangers, anything legally sensitive.
Post-moderation — publish immediately, check in the background, remove after the fact. Instant for the user, but harmful content is briefly live. Use it where speed matters and a few seconds' exposure is low-risk (established users, low-stakes channels).
Most platforms run both and pick per surface and per user: a trusted member's message posts instantly, while a new account's first link is held until it clears.
Automated, human, or both
Pure human review doesn't scale and burns people out on the worst content. Pure automation is fast and consistent but wrong on the genuinely ambiguous cases — the same words are a joke in one context and a threat in another.
The approach that holds up is confidence-banded: let the model auto-decide the clear cases and send only the uncertain middle to people.
- High confidence (e.g. above 0.95): auto-allow or auto-remove.
- Medium confidence (roughly 0.5–0.95): publish or hold, but queue for a human.
- Low confidence: allow, and sample for monitoring.
That keeps human attention on the small slice of content where judgment actually adds value, instead of the majority the model already handles correctly. Configuring Thresholds covers how to pick those bands for your platform.
Getting started with Discuse
Discuse exposes all of this through one endpoint. Send text, image URLs, or files; get back categories, scores, and a single has_violations flag:
curl -X POST https://api.discuse.com/api/v2/check \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"content": {
"text": "Hello, this is a test message!"
}
}'
From there you apply your own thresholds and actions. The Quick Start Guide gets a working integration running in a few minutes, and the AI Content Moderation Guide covers the confidence-banded architecture in depth.