Configuring Detection Thresholds
Detection thresholds set the confidence level at which Discuse flags content, balancing false positives against false negatives. In Discuse these thresholds are project settings configured in the dashboard (or the settings API), and the API request only toggles which checks run — it does not carry numeric thresholds. This guide explains how the trade-off works and how to choose values for your platform.
How Discuse thresholds work
Each check returns a confidence score from 0.0 to 1.0 and a hit flag. The hit is set when the score meets or exceeds the threshold you configured for that category in your project. The configurable thresholds are:
| Threshold (project setting) | Applies to |
|---|---|
threshold_sentiment |
Overall negative-sentiment cut-off |
threshold_toxicity |
Toxic text |
threshold_profanity |
Profanity |
threshold_threat |
Threats |
threshold_insult |
Insults |
threshold_spam |
Spam classifier confidence |
threshold_images |
Explicit imagery (overall) |
threshold_images_porn |
Pornographic imagery |
threshold_images_sexual |
Sexually suggestive imagery |
These are the names exposed by the project settings API; the per-request settings object only contains on/off check_* toggles plus expected_language. You change thresholds in the dashboard, not per request.
The trade-off
Lower Threshold = More Strict
├── More content flagged
├── Higher false positive rate
├── Fewer harmful posts slip through
└── More user friction
Higher Threshold = More Permissive
├── Less content flagged
├── Lower false positive rate
├── More harmful posts may slip through
└── Better user experience
Visualizing the trade-off
False Positives ─────────────────────────►
Few Many
┌─────────────────────────────────────────┐
False Few │ ◄─── Ideal Zone │
Negatives │ (High threshold, │
│ low false rates) │
│ │ │
│ │ Your platform's │
│ │ optimal point → ● │
▼ │ │
Many │ Too permissive ──►│
└─────────────────────────────────────────┘
Threshold: 0.3 0.5 0.7 0.9
What thresholds suit which platform?
The values below are starting points expressed in Discuse's threshold names (threshold_toxicity, threshold_images_porn, and so on). Treat them as a baseline to tune against your own false-positive data, not as guaranteed-correct numbers. A lower threshold flags more aggressively.
Social media platforms
General-purpose social platforms need balanced moderation:
const SOCIAL_MEDIA_THRESHOLDS = {
threshold_toxicity: 0.7,
threshold_profanity: 0.6,
threshold_threat: 0.5, // Lower (stricter) for threats
threshold_insult: 0.7,
threshold_spam: 0.75,
threshold_images_porn: 0.6,
threshold_images_sexual: 0.8 // More permissive for suggestive content
};
Professional / business platforms
Business contexts usually want stricter moderation:
const PROFESSIONAL_THRESHOLDS = {
threshold_toxicity: 0.5,
threshold_profanity: 0.4,
threshold_threat: 0.3,
threshold_insult: 0.5,
threshold_spam: 0.6,
threshold_images_porn: 0.3,
threshold_images_sexual: 0.5
};
Gaming communities
Gaming platforms may tolerate more banter while staying strict on real threats:
const GAMING_THRESHOLDS = {
threshold_toxicity: 0.8,
threshold_profanity: 0.85, // Banter allowed
threshold_threat: 0.5, // Still strict on real threats
threshold_insult: 0.8,
threshold_spam: 0.8,
threshold_images_porn: 0.6,
threshold_images_sexual: 0.9
};
Children's platforms
Platforms for minors call for the strictest settings:
const CHILDRENS_THRESHOLDS = {
threshold_toxicity: 0.3,
threshold_profanity: 0.2,
threshold_threat: 0.2,
threshold_insult: 0.3,
threshold_spam: 0.5,
threshold_images_porn: 0.1, // Maximum strictness
threshold_images_sexual: 0.2
};
Can I vary thresholds dynamically?
Discuse stores one set of thresholds per project, so per-user or per-context variation lives in your application. The patterns below compute an effective threshold on your side; you then either route to different projects (each with its own configured thresholds) or apply the comparison yourself against the scores Discuse returns.
User trust levels
Adjust effective thresholds based on user reputation:
function getThresholds(user) {
const baseThresholds = PLATFORM_THRESHOLDS;
const trustMultipliers = {
new_user: 0.8, // Stricter (lower effective threshold)
basic_user: 1.0, // Standard
verified_user: 1.15, // Slightly more permissive
trusted_user: 1.3, // More permissive
moderator: 1.5 // Most permissive
};
const multiplier = trustMultipliers[user.trustLevel] || 1.0;
return Object.fromEntries(
Object.entries(baseThresholds).map(([key, value]) => [
key,
typeof value === 'number'
? Math.min(value * multiplier, 0.95)
: adjustNestedThresholds(value, multiplier)
])
);
}
Context-based thresholds
Different content types may need different thresholds:
const CONTEXT_THRESHOLDS = {
// Public posts visible to everyone
public_post: {
toxic: 0.6,
profanity: 0.5
},
// Direct messages between users
direct_message: {
toxic: 0.7, // Slightly more permissive
profanity: 0.6
},
// Comments on public content
comment: {
toxic: 0.55, // Stricter than posts
profanity: 0.5
},
// Profile information
profile: {
toxic: 0.5, // Strict for public-facing content
profanity: 0.4
}
};
function getContextThresholds(contentType) {
return CONTEXT_THRESHOLDS[contentType] || CONTEXT_THRESHOLDS.public_post;
}
Time-based adjustments
Adjust thresholds during high-risk periods:
function getTimeAdjustedThresholds(baseThresholds) {
const hour = new Date().getHours();
const dayOfWeek = new Date().getDay();
// Stricter during off-hours when fewer moderators available
const isOffHours = hour < 6 || hour > 22;
const isWeekend = dayOfWeek === 0 || dayOfWeek === 6;
let multiplier = 1.0;
if (isOffHours) multiplier *= 0.85;
if (isWeekend) multiplier *= 0.9;
return adjustThresholds(baseThresholds, multiplier);
}
Implementing threshold configuration
Centralized configuration
// config/moderation.js — mirrors your Discuse project thresholds so app-side
// routing stays in sync with the values configured in the dashboard.
export const ModerationConfig = {
thresholds: {
threshold_toxicity: parseFloat(process.env.THRESHOLD_TOXICITY || '0.7'),
threshold_profanity: parseFloat(process.env.THRESHOLD_PROFANITY || '0.6'),
threshold_threat: parseFloat(process.env.THRESHOLD_THREAT || '0.5'),
threshold_insult: parseFloat(process.env.THRESHOLD_INSULT || '0.7'),
threshold_spam: parseFloat(process.env.THRESHOLD_SPAM || '0.75'),
threshold_images_porn: parseFloat(process.env.THRESHOLD_IMAGES_PORN || '0.6'),
threshold_images_sexual: parseFloat(process.env.THRESHOLD_IMAGES_SEXUAL || '0.8')
},
actions: {
high_confidence: 'auto_block', // score > 0.95
medium_confidence: 'human_review', // 0.7 - 0.95
low_confidence: 'allow_with_flag' // threshold - 0.7
}
};
Runtime threshold updates
Allow threshold adjustments without redeployment:
class ModerationService {
constructor() {
this.thresholds = defaultThresholds;
this.loadRemoteConfig();
}
async loadRemoteConfig() {
try {
const config = await fetch('/api/admin/moderation-config');
const data = await config.json();
this.thresholds = data.thresholds;
console.log('Loaded remote moderation config');
} catch (error) {
console.warn('Using default thresholds:', error);
}
}
async checkContent(content, context) {
const result = await callModerationAPI(content);
const thresholds = this.getThresholdsForContext(context);
return this.applyThresholds(result, thresholds);
}
}
Measuring threshold effectiveness
Key metrics
const MODERATION_METRICS = {
// Accuracy
precision: 'True positives / (True positives + False positives)',
recall: 'True positives / (True positives + False negatives)',
f1_score: 'Harmonic mean of precision and recall',
// User impact
block_rate: 'Content blocked / Total content',
appeal_rate: 'Appeals filed / Content blocked',
appeal_success: 'Appeals won / Appeals filed',
// Operational
review_queue_size: 'Items waiting for human review',
review_time: 'Average time to human decision'
};
A/B testing thresholds
Test threshold changes on a subset of traffic:
async function moderateWithExperiment(content, userId) {
const experiment = getExperiment(userId, 'threshold_test');
const thresholds = experiment === 'control'
? CURRENT_THRESHOLDS
: EXPERIMENTAL_THRESHOLDS;
const result = await checkContent(content);
const decision = applyThresholds(result, thresholds);
// Log for analysis
await logExperiment({
experiment: 'threshold_test',
variant: experiment,
content_id: content.id,
scores: result,
decision: decision,
timestamp: Date.now()
});
return decision;
}
Analyzing results
-- Calculate precision and recall for each threshold variant
SELECT
variant,
COUNT(*) as total_decisions,
SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) as true_positives,
SUM(CASE WHEN blocked AND NOT actually_harmful THEN 1 ELSE 0 END) as false_positives,
SUM(CASE WHEN NOT blocked AND actually_harmful THEN 1 ELSE 0 END) as false_negatives,
SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
NULLIF(SUM(CASE WHEN blocked THEN 1 ELSE 0 END), 0) as precision,
SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
NULLIF(SUM(CASE WHEN actually_harmful THEN 1 ELSE 0 END), 0) as recall
FROM moderation_decisions
WHERE experiment = 'threshold_test'
GROUP BY variant;
Threshold tuning workflow
Step 1: Establish a baseline
// Start with conservative thresholds
const INITIAL_THRESHOLDS = {
threshold_toxicity: 0.5,
threshold_profanity: 0.5,
threshold_spam: 0.6
};
Step 2: Collect data
async function logModerationDecision(content, result, decision) {
await db.insert('moderation_log', {
content_id: content.id,
content_hash: hashContent(content.text),
scores: result.results,
thresholds_used: currentThresholds,
decision: decision,
user_trust_level: content.author.trustLevel,
created_at: Date.now()
});
}
Step 3: Analyze false rates
Review blocked content and user appeals to identify:
- False positives: safe content incorrectly blocked
- False negatives: harmful content that wasn't caught
Step 4: Adjust and iterate
// Based on analysis, raise thresholds that fire too often
const ADJUSTED_THRESHOLDS = {
threshold_toxicity: 0.65, // Raised after false positives
threshold_profanity: 0.55,
threshold_spam: 0.7
};
Step 5: Monitor continuously
Set up alerts for threshold effectiveness:
async function checkModerationHealth() {
const stats = await getModerationStats(last24Hours);
// Alert if false positive rate too high
if (stats.appealSuccessRate > 0.3) {
alert('High appeal success rate - thresholds may be too strict');
}
// Alert if harmful content is getting through
if (stats.reportedAfterApproval > threshold) {
alert('Increase in reported content - thresholds may be too permissive');
}
}
Best practices summary
- Start conservative: begin with stricter thresholds and loosen them based on data.
- Use context: different surfaces (public posts, DMs, profiles) warrant different thresholds.
- Trust levels matter: adjust the effective threshold for user reputation in your app.
- Measure everything: track precision, recall, and user impact.
- Iterate continuously: moderation is never "done".
- Document decisions: keep records of why thresholds changed.
- Have fallbacks: route borderline cases to human review.
Remember: in Discuse these thresholds are project settings. Change them in the dashboard or via the settings API; the per-request settings object only toggles which check_* run.
Next steps
- AI Content Moderation Guide - Understanding AI moderation
- Scaling Content Moderation - High-volume implementation
- Text Analysis - Text-specific moderation details