配置检测阈值
检测阈值用于设定 Discuse 标记内容所需的置信度,在误报和漏报之间取得平衡。在 Discuse 中,这些阈值是项目设置,可在仪表盘(或设置 API)中配置;API 请求只用于切换要运行哪些检测项,并不携带数值阈值。本指南将说明这种取舍如何运作,以及如何为你的平台选择合适的值。
Discuse 阈值的工作原理
每项检查都会返回一个 0.0 到 1.0 之间的置信度分数,以及一个 hit 标记。当分数达到或超过你在项目中为该类别配置的阈值时,hit 就会被设置。可配置的阈值包括:
| 阈值(项目设置) | 适用范围 |
|---|---|
threshold_sentiment |
整体负面情绪截断值 |
threshold_toxicity |
有毒文本 |
threshold_profanity |
粗俗用语 |
threshold_threat |
威胁内容 |
threshold_insult |
侮辱内容 |
threshold_spam |
垃圾内容分类器置信度 |
threshold_images |
露骨图像(整体) |
threshold_images_porn |
色情图像 |
threshold_images_sexual |
性暗示图像 |
这些是项目设置 API 暴露的名称;每次请求中的 settings 对象只包含用于开启/关闭的 check_* 开关,以及 expected_language。阈值需要在控制台中修改,而不是按单次请求修改。
取舍
Lower Threshold = More Strict
├── More content flagged
├── Higher false positive rate
├── Fewer harmful posts slip through
└── More user friction
Higher Threshold = More Permissive
├── Less content flagged
├── Lower false positive rate
├── More harmful posts may slip through
└── Better user experience
将这种取舍可视化
False Positives ─────────────────────────►
Few Many
┌─────────────────────────────────────────┐
False Few │ ◄─── Ideal Zone │
Negatives │ (High threshold, │
│ low false rates) │
│ │ │
│ │ Your platform's │
│ │ optimal point → ● │
▼ │ │
Many │ Too permissive ──►│
└─────────────────────────────────────────┘
Threshold: 0.3 0.5 0.7 0.9
哪些平台适合使用哪些阈值?
下面的数值是以 Discuse 的阈值名称(threshold_toxicity、threshold_images_porn 等)表示的起始参考值。请把它们作为基线,并结合你自己的误判数据进行调优,而不是当作绝对正确的数值。阈值越低,标记越严格。
社交媒体平台
通用型社交平台需要较为均衡的审核策略:
const SOCIAL_MEDIA_THRESHOLDS = {
threshold_toxicity: 0.7,
threshold_profanity: 0.6,
threshold_threat: 0.5, // Lower (stricter) for threats
threshold_insult: 0.7,
threshold_spam: 0.75,
threshold_images_porn: 0.6,
threshold_images_sexual: 0.8 // More permissive for suggestive content
};
专业 / 商务平台
商务场景通常需要更严格的审核:
const PROFESSIONAL_THRESHOLDS = {
threshold_toxicity: 0.5,
threshold_profanity: 0.4,
threshold_threat: 0.3,
threshold_insult: 0.5,
threshold_spam: 0.6,
threshold_images_porn: 0.3,
threshold_images_sexual: 0.5
};
游戏社区
游戏平台可以对玩笑调侃更宽容,但对真实威胁仍应保持严格:
const GAMING_THRESHOLDS = {
threshold_toxicity: 0.8,
threshold_profanity: 0.85, // Banter allowed
threshold_threat: 0.5, // Still strict on real threats
threshold_insult: 0.8,
threshold_spam: 0.8,
threshold_images_porn: 0.6,
threshold_images_sexual: 0.9
};
儿童平台
面向未成年人的平台需要采用最严格的设置:
const CHILDRENS_THRESHOLDS = {
threshold_toxicity: 0.3,
threshold_profanity: 0.2,
threshold_threat: 0.2,
threshold_insult: 0.3,
threshold_spam: 0.5,
threshold_images_porn: 0.1, // Maximum strictness
threshold_images_sexual: 0.2
};
我可以动态调整阈值吗?
Discuse 会为每个项目存储一组阈值,因此按用户或按场景变化的阈值需要在你的应用中处理。下面这些模式会在你这一侧计算出有效阈值;之后,你可以将内容路由到不同项目(每个项目都有各自配置的阈值),也可以自行将 Discuse 返回的分数与阈值进行比较。
用户信任等级
根据用户声誉调整有效阈值:
function getThresholds(user) {
const baseThresholds = PLATFORM_THRESHOLDS;
const trustMultipliers = {
new_user: 0.8, // Stricter (lower effective threshold)
basic_user: 1.0, // Standard
verified_user: 1.15, // Slightly more permissive
trusted_user: 1.3, // More permissive
moderator: 1.5 // Most permissive
};
const multiplier = trustMultipliers[user.trustLevel] || 1.0;
return Object.fromEntries(
Object.entries(baseThresholds).map(([key, value]) => [
key,
typeof value === 'number'
? Math.min(value * multiplier, 0.95)
: adjustNestedThresholds(value, multiplier)
])
);
}
基于场景的阈值
不同内容类型可能需要不同的阈值:
const CONTEXT_THRESHOLDS = {
// Public posts visible to everyone
public_post: {
toxic: 0.6,
profanity: 0.5
},
// Direct messages between users
direct_message: {
toxic: 0.7, // Slightly more permissive
profanity: 0.6
},
// Comments on public content
comment: {
toxic: 0.55, // Stricter than posts
profanity: 0.5
},
// Profile information
profile: {
toxic: 0.5, // Strict for public-facing content
profanity: 0.4
}
};
function getContextThresholds(contentType) {
return CONTEXT_THRESHOLDS[contentType] || CONTEXT_THRESHOLDS.public_post;
}
基于时间的调整
在高风险时段调整阈值:
function getTimeAdjustedThresholds(baseThresholds) {
const hour = new Date().getHours();
const dayOfWeek = new Date().getDay();
// Stricter during off-hours when fewer moderators available
const isOffHours = hour < 6 || hour > 22;
const isWeekend = dayOfWeek === 0 || dayOfWeek === 6;
let multiplier = 1.0;
if (isOffHours) multiplier *= 0.85;
if (isWeekend) multiplier *= 0.9;
return adjustThresholds(baseThresholds, multiplier);
}
实现阈值配置
集中式配置
// config/moderation.js — mirrors your Discuse project thresholds so app-side
// routing stays in sync with the values configured in the dashboard.
export const ModerationConfig = {
thresholds: {
threshold_toxicity: parseFloat(process.env.THRESHOLD_TOXICITY || '0.7'),
threshold_profanity: parseFloat(process.env.THRESHOLD_PROFANITY || '0.6'),
threshold_threat: parseFloat(process.env.THRESHOLD_THREAT || '0.5'),
threshold_insult: parseFloat(process.env.THRESHOLD_INSULT || '0.7'),
threshold_spam: parseFloat(process.env.THRESHOLD_SPAM || '0.75'),
threshold_images_porn: parseFloat(process.env.THRESHOLD_IMAGES_PORN || '0.6'),
threshold_images_sexual: parseFloat(process.env.THRESHOLD_IMAGES_SEXUAL || '0.8')
},
actions: {
high_confidence: 'auto_block', // score > 0.95
medium_confidence: 'human_review', // 0.7 - 0.95
low_confidence: 'allow_with_flag' // threshold - 0.7
}
};
运行时阈值更新
无需重新部署即可调整阈值:
class ModerationService {
constructor() {
this.thresholds = defaultThresholds;
this.loadRemoteConfig();
}
async loadRemoteConfig() {
try {
const config = await fetch('/api/admin/moderation-config');
const data = await config.json();
this.thresholds = data.thresholds;
console.log('Loaded remote moderation config');
} catch (error) {
console.warn('Using default thresholds:', error);
}
}
async checkContent(content, context) {
const result = await callModerationAPI(content);
const thresholds = this.getThresholdsForContext(context);
return this.applyThresholds(result, thresholds);
}
}
衡量阈值的有效性
关键指标
const MODERATION_METRICS = {
// Accuracy
precision: 'True positives / (True positives + False positives)',
recall: 'True positives / (True positives + False negatives)',
f1_score: 'Harmonic mean of precision and recall',
// User impact
block_rate: 'Content blocked / Total content',
appeal_rate: 'Appeals filed / Content blocked',
appeal_success: 'Appeals won / Appeals filed',
// Operational
review_queue_size: 'Items waiting for human review',
review_time: 'Average time to human decision'
};
对阈值进行 A/B 测试
在部分流量上测试阈值变更:
async function moderateWithExperiment(content, userId) {
const experiment = getExperiment(userId, 'threshold_test');
const thresholds = experiment === 'control'
? CURRENT_THRESHOLDS
: EXPERIMENTAL_THRESHOLDS;
const result = await checkContent(content);
const decision = applyThresholds(result, thresholds);
// Log for analysis
await logExperiment({
experiment: 'threshold_test',
variant: experiment,
content_id: content.id,
scores: result,
decision: decision,
timestamp: Date.now()
});
return decision;
}
分析结果
-- Calculate precision and recall for each threshold variant
SELECT
variant,
COUNT(*) as total_decisions,
SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) as true_positives,
SUM(CASE WHEN blocked AND NOT actually_harmful THEN 1 ELSE 0 END) as false_positives,
SUM(CASE WHEN NOT blocked AND actually_harmful THEN 1 ELSE 0 END) as false_negatives,
SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
NULLIF(SUM(CASE WHEN blocked THEN 1 ELSE 0 END), 0) as precision,
SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
NULLIF(SUM(CASE WHEN actually_harmful THEN 1 ELSE 0 END), 0) as recall
FROM moderation_decisions
WHERE experiment = 'threshold_test'
GROUP BY variant;
阈值调优工作流
步骤 1:建立基准
// Start with conservative thresholds
const INITIAL_THRESHOLDS = {
threshold_toxicity: 0.5,
threshold_profanity: 0.5,
threshold_spam: 0.6
};
步骤 2:收集数据
async function logModerationDecision(content, result, decision) {
await db.insert('moderation_log', {
content_id: content.id,
content_hash: hashContent(content.text),
scores: result.results,
thresholds_used: currentThresholds,
decision: decision,
user_trust_level: content.author.trustLevel,
created_at: Date.now()
});
}
步骤 3:分析误判率
查看被拦截的内容和用户申诉,以识别:
- 误报:安全内容被错误拦截
- 漏报:有害内容未被识别
步骤 4:调整并迭代
// Based on analysis, raise thresholds that fire too often
const ADJUSTED_THRESHOLDS = {
threshold_toxicity: 0.65, // Raised after false positives
threshold_profanity: 0.55,
threshold_spam: 0.7
};
步骤 5:持续监控
为阈值有效性设置警报:
async function checkModerationHealth() {
const stats = await getModerationStats(last24Hours);
// Alert if false positive rate too high
if (stats.appealSuccessRate > 0.3) {
alert('High appeal success rate - thresholds may be too strict');
}
// Alert if harmful content is getting through
if (stats.reportedAfterApproval > threshold) {
alert('Increase in reported content - thresholds may be too permissive');
}
}
最佳实践总结
- 从保守开始:先使用更严格的阈值,再根据数据逐步放宽。
- 结合上下文:不同场景(公开帖子、私信、个人资料)应采用不同阈值。
- 信任等级很重要:根据用户在你应用中的信誉调整实际阈值。
- 衡量一切:跟踪准确率、召回率以及对用户的影响。
- 持续迭代:内容审核永远不会“完成”。
- 记录决策:保留阈值变更原因的记录。
- 准备兜底方案:将边界案例转交给人工审核。
请记住:在 Discuse 中,这些阈值属于项目设置。可在控制面板中更改,或通过设置 API 更改;每次请求中的 settings 对象只用于切换要运行哪些 check_*。