配置检测阈值

检测阈值用于设定 Discuse 标记内容所需的置信度，在误报和漏报之间取得平衡。在 Discuse 中，这些阈值是项目设置，可在仪表盘（或设置 API）中配置；API 请求只用于切换要运行哪些检测项，并不携带数值阈值。本指南将说明这种取舍如何运作，以及如何为你的平台选择合适的值。

Discuse 阈值的工作原理

每项检查都会返回一个 0.0 到 1.0 之间的置信度分数，以及一个 hit 标记。当分数达到或超过你在项目中为该类别配置的阈值时，hit 就会被设置。可配置的阈值包括：

阈值（项目设置）	适用范围
`threshold_sentiment`	整体负面情绪截断值
`threshold_toxicity`	有毒文本
`threshold_profanity`	粗俗用语
`threshold_threat`	威胁内容
`threshold_insult`	侮辱内容
`threshold_spam`	垃圾内容分类器置信度
`threshold_images`	露骨图像（整体）
`threshold_images_porn`	色情图像
`threshold_images_sexual`	性暗示图像

这些是项目设置 API 暴露的名称；每次请求中的 settings 对象只包含用于开启/关闭的 check_* 开关，以及 expected_language。阈值需要在控制台中修改，而不是按单次请求修改。

取舍

Lower Threshold = More Strict
├── More content flagged
├── Higher false positive rate
├── Fewer harmful posts slip through
└── More user friction

Higher Threshold = More Permissive
├── Less content flagged
├── Lower false positive rate
├── More harmful posts may slip through
└── Better user experience

将这种取舍可视化

                False Positives ─────────────────────────►
                     Few                              Many
               ┌─────────────────────────────────────────┐
False      Few │   ◄─── Ideal Zone                      │
Negatives      │        (High threshold,                │
               │         low false rates)               │
      │        │                                        │
      │        │              Your platform's           │
      │        │              optimal point →  ●        │
      ▼        │                                        │
          Many │                      Too permissive ──►│
               └─────────────────────────────────────────┘
                       Threshold: 0.3   0.5   0.7   0.9

哪些平台适合使用哪些阈值？

下面的数值是以 Discuse 的阈值名称（threshold_toxicity、threshold_images_porn 等）表示的起始参考值。请把它们作为基线，并结合你自己的误判数据进行调优，而不是当作绝对正确的数值。阈值越低，标记越严格。

社交媒体平台

通用型社交平台需要较为均衡的审核策略：

const SOCIAL_MEDIA_THRESHOLDS = {
  threshold_toxicity: 0.7,
  threshold_profanity: 0.6,
  threshold_threat: 0.5,        // Lower (stricter) for threats
  threshold_insult: 0.7,
  threshold_spam: 0.75,
  threshold_images_porn: 0.6,
  threshold_images_sexual: 0.8  // More permissive for suggestive content
};

专业 / 商务平台

商务场景通常需要更严格的审核：

const PROFESSIONAL_THRESHOLDS = {
  threshold_toxicity: 0.5,
  threshold_profanity: 0.4,
  threshold_threat: 0.3,
  threshold_insult: 0.5,
  threshold_spam: 0.6,
  threshold_images_porn: 0.3,
  threshold_images_sexual: 0.5
};

游戏社区

游戏平台可以对玩笑调侃更宽容，但对真实威胁仍应保持严格：

const GAMING_THRESHOLDS = {
  threshold_toxicity: 0.8,
  threshold_profanity: 0.85,    // Banter allowed
  threshold_threat: 0.5,        // Still strict on real threats
  threshold_insult: 0.8,
  threshold_spam: 0.8,
  threshold_images_porn: 0.6,
  threshold_images_sexual: 0.9
};

儿童平台

面向未成年人的平台需要采用最严格的设置：

const CHILDRENS_THRESHOLDS = {
  threshold_toxicity: 0.3,
  threshold_profanity: 0.2,
  threshold_threat: 0.2,
  threshold_insult: 0.3,
  threshold_spam: 0.5,
  threshold_images_porn: 0.1,   // Maximum strictness
  threshold_images_sexual: 0.2
};

我可以动态调整阈值吗？

Discuse 会为每个项目存储一组阈值，因此按用户或按场景变化的阈值需要在你的应用中处理。下面这些模式会在你这一侧计算出有效阈值；之后，你可以将内容路由到不同项目（每个项目都有各自配置的阈值），也可以自行将 Discuse 返回的分数与阈值进行比较。

用户信任等级

根据用户声誉调整有效阈值：

function getThresholds(user) {
  const baseThresholds = PLATFORM_THRESHOLDS;

  const trustMultipliers = {
    new_user: 0.8,       // Stricter (lower effective threshold)
    basic_user: 1.0,     // Standard
    verified_user: 1.15, // Slightly more permissive
    trusted_user: 1.3,   // More permissive
    moderator: 1.5       // Most permissive
  };

  const multiplier = trustMultipliers[user.trustLevel] || 1.0;

  return Object.fromEntries(
    Object.entries(baseThresholds).map(([key, value]) => [
      key,
      typeof value === 'number'
        ? Math.min(value * multiplier, 0.95)
        : adjustNestedThresholds(value, multiplier)
    ])
  );
}

基于场景的阈值

不同内容类型可能需要不同的阈值：

const CONTEXT_THRESHOLDS = {
  // Public posts visible to everyone
  public_post: {
    toxic: 0.6,
    profanity: 0.5
  },

  // Direct messages between users
  direct_message: {
    toxic: 0.7,      // Slightly more permissive
    profanity: 0.6
  },

  // Comments on public content
  comment: {
    toxic: 0.55,     // Stricter than posts
    profanity: 0.5
  },

  // Profile information
  profile: {
    toxic: 0.5,      // Strict for public-facing content
    profanity: 0.4
  }
};

function getContextThresholds(contentType) {
  return CONTEXT_THRESHOLDS[contentType] || CONTEXT_THRESHOLDS.public_post;
}

基于时间的调整

在高风险时段调整阈值：

function getTimeAdjustedThresholds(baseThresholds) {
  const hour = new Date().getHours();
  const dayOfWeek = new Date().getDay();

  // Stricter during off-hours when fewer moderators available
  const isOffHours = hour < 6 || hour > 22;
  const isWeekend = dayOfWeek === 0 || dayOfWeek === 6;

  let multiplier = 1.0;

  if (isOffHours) multiplier *= 0.85;
  if (isWeekend) multiplier *= 0.9;

  return adjustThresholds(baseThresholds, multiplier);
}

实现阈值配置

集中式配置

// config/moderation.js — mirrors your Discuse project thresholds so app-side
// routing stays in sync with the values configured in the dashboard.
export const ModerationConfig = {
  thresholds: {
    threshold_toxicity:      parseFloat(process.env.THRESHOLD_TOXICITY || '0.7'),
    threshold_profanity:     parseFloat(process.env.THRESHOLD_PROFANITY || '0.6'),
    threshold_threat:        parseFloat(process.env.THRESHOLD_THREAT || '0.5'),
    threshold_insult:        parseFloat(process.env.THRESHOLD_INSULT || '0.7'),
    threshold_spam:          parseFloat(process.env.THRESHOLD_SPAM || '0.75'),
    threshold_images_porn:   parseFloat(process.env.THRESHOLD_IMAGES_PORN || '0.6'),
    threshold_images_sexual: parseFloat(process.env.THRESHOLD_IMAGES_SEXUAL || '0.8')
  },

  actions: {
    high_confidence: 'auto_block',     // score > 0.95
    medium_confidence: 'human_review', // 0.7 - 0.95
    low_confidence: 'allow_with_flag'  // threshold - 0.7
  }
};

运行时阈值更新

无需重新部署即可调整阈值：

class ModerationService {
  constructor() {
    this.thresholds = defaultThresholds;
    this.loadRemoteConfig();
  }

  async loadRemoteConfig() {
    try {
      const config = await fetch('/api/admin/moderation-config');
      const data = await config.json();
      this.thresholds = data.thresholds;
      console.log('Loaded remote moderation config');
    } catch (error) {
      console.warn('Using default thresholds:', error);
    }
  }

  async checkContent(content, context) {
    const result = await callModerationAPI(content);
    const thresholds = this.getThresholdsForContext(context);

    return this.applyThresholds(result, thresholds);
  }
}

衡量阈值的有效性

关键指标

const MODERATION_METRICS = {
  // Accuracy
  precision: 'True positives / (True positives + False positives)',
  recall: 'True positives / (True positives + False negatives)',
  f1_score: 'Harmonic mean of precision and recall',

  // User impact
  block_rate: 'Content blocked / Total content',
  appeal_rate: 'Appeals filed / Content blocked',
  appeal_success: 'Appeals won / Appeals filed',

  // Operational
  review_queue_size: 'Items waiting for human review',
  review_time: 'Average time to human decision'
};

对阈值进行 A/B 测试

在部分流量上测试阈值变更：

async function moderateWithExperiment(content, userId) {
  const experiment = getExperiment(userId, 'threshold_test');

  const thresholds = experiment === 'control'
    ? CURRENT_THRESHOLDS
    : EXPERIMENTAL_THRESHOLDS;

  const result = await checkContent(content);
  const decision = applyThresholds(result, thresholds);

  // Log for analysis
  await logExperiment({
    experiment: 'threshold_test',
    variant: experiment,
    content_id: content.id,
    scores: result,
    decision: decision,
    timestamp: Date.now()
  });

  return decision;
}

分析结果

-- Calculate precision and recall for each threshold variant
SELECT
  variant,
  COUNT(*) as total_decisions,
  SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) as true_positives,
  SUM(CASE WHEN blocked AND NOT actually_harmful THEN 1 ELSE 0 END) as false_positives,
  SUM(CASE WHEN NOT blocked AND actually_harmful THEN 1 ELSE 0 END) as false_negatives,
  SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
    NULLIF(SUM(CASE WHEN blocked THEN 1 ELSE 0 END), 0) as precision,
  SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
    NULLIF(SUM(CASE WHEN actually_harmful THEN 1 ELSE 0 END), 0) as recall
FROM moderation_decisions
WHERE experiment = 'threshold_test'
GROUP BY variant;

阈值调优工作流

步骤 1：建立基准

// Start with conservative thresholds
const INITIAL_THRESHOLDS = {
  threshold_toxicity: 0.5,
  threshold_profanity: 0.5,
  threshold_spam: 0.6
};

步骤 2：收集数据

async function logModerationDecision(content, result, decision) {
  await db.insert('moderation_log', {
    content_id: content.id,
    content_hash: hashContent(content.text),
    scores: result.results,
    thresholds_used: currentThresholds,
    decision: decision,
    user_trust_level: content.author.trustLevel,
    created_at: Date.now()
  });
}

步骤 3：分析误判率

查看被拦截的内容和用户申诉，以识别：

误报：安全内容被错误拦截
漏报：有害内容未被识别

步骤 4：调整并迭代

// Based on analysis, raise thresholds that fire too often
const ADJUSTED_THRESHOLDS = {
  threshold_toxicity: 0.65,  // Raised after false positives
  threshold_profanity: 0.55,
  threshold_spam: 0.7
};

步骤 5：持续监控

为阈值有效性设置警报：

async function checkModerationHealth() {
  const stats = await getModerationStats(last24Hours);

  // Alert if false positive rate too high
  if (stats.appealSuccessRate > 0.3) {
    alert('High appeal success rate - thresholds may be too strict');
  }

  // Alert if harmful content is getting through
  if (stats.reportedAfterApproval > threshold) {
    alert('Increase in reported content - thresholds may be too permissive');
  }
}

最佳实践总结

从保守开始：先使用更严格的阈值，再根据数据逐步放宽。
结合上下文：不同场景（公开帖子、私信、个人资料）应采用不同阈值。
信任等级很重要：根据用户在你应用中的信誉调整实际阈值。
衡量一切：跟踪准确率、召回率以及对用户的影响。
持续迭代：内容审核永远不会“完成”。
记录决策：保留阈值变更原因的记录。
准备兜底方案：将边界案例转交给人工审核。

请记住：在 Discuse 中，这些阈值属于项目设置。可在控制面板中更改，或通过设置 API 更改；每次请求中的 settings 对象只用于切换要运行哪些 check_*。

后续步骤

AI 内容审核指南 - 了解 AI 审核
扩展内容审核规模 - 高容量实施
文本分析 - 文本专项审核详情