AI 内容审核指南

Discuse 会检查文本和图片的情感/毒性、垃圾内容、不受欢迎的语言、脏话、链接以及露骨图像，然后返回按类别划分的结果，方便你的代码自动批准、标记或拒绝内容。你只需将内容发送到一个端点 POST https://api.discuse.com/api/v2/check，再读取返回的结构化评分即可。本指南将介绍这些检查的工作方式、围绕它们的审核模式，以及如何将 Discuse API 接入你的流程。

什么是 AI 内容审核？

AI 内容审核使用机器学习模型自动检测并分类潜在有害内容。人工审核员一次只能查看一条内容，而这些模型会在内容提交时立即评分，因此可以在内容被其他用户看到之前完成检查。

它是如何工作的？

提交内容：将文本和/或媒体 URL 发送到审核 API。
运行检查：API 会执行已启用的检查（情绪、语言、垃圾信息、违禁词、图片、链接、杀毒）。
为每个类别评分：每项检查都会返回评分以及一个 hit 标记，用于表示是否超过了你配置的阈值。
做出决策：读取 has_violations（以及各项检查的评分），以决定批准、标记或拒绝。

使用 Discuse 时，请求如下所示：

const response = await fetch('https://api.discuse.com/api/v2/check', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': process.env.DISCUSE_API_KEY
  },
  body: JSON.stringify({
    content: { text: 'message to check', image_urls: ['https://...'] },
    settings: { check_sentiment: true, check_spam: true, check_images: true }
  })
});
const result = await response.json(); // { has_violations, results: { sentiment, spamfinder, images, ... }, usage }

Discuse 的各项检查涵盖哪些内容？

检查（`settings` 开关）	涵盖内容	关键结果字段
`check_sentiment`	文本中的负面情绪、毒性、脏话、威胁、侮辱	`sentiment.is_toxic`, `sentiment.toxicity`, `sentiment.score`, `sentiment.hit`
`check_spam`	文本的垃圾信息分类	`spamfinder.label`, `spamfinder.confidence`, `spamfinder.is_spam`, `spamfinder.hit`
`check_language`	文本是否符合预期语言	`language.language`, `language.confidence`, `language.hit`
`check_badwords`	匹配自定义违禁词列表	`badwords.hit`, `badwords.matched_words`
`check_images`	图片 URL 中的露骨图像	`images.porn`, `images.sexual`, `images.neutral`, `images.hit`
`check_links`	链接信誉	`links.status`, `links.hit`
`check_antivirus`	文档/文件 URL 中的恶意软件	`antivirus.status`, `antivirus.hit`

每个开关都是布尔值。将评分转换为 hit 的数值阈值会在仪表板中按项目配置，而不是在每次请求中传入——请参阅阈值配置指南。

AI 审核的优势

规模化

AI 能处理人工团队无法应对的内容量。一次 API 调用可在毫秒级返回结果，因此审核能够跟上提交速度，而不会积压在审核队列之后。对于边界情况，可将自动检查与人工队列配合使用（下文会介绍）。

速度

实时检查让你可以在内容发布前完成筛查：

// Pre-moderation: Check content before publishing
async function publishPost(content) {
  const moderation = await checkContent(content);

  if (moderation.has_violations) {
    return { published: false, reason: moderation.message };
  }

  // Content passes moderation
  return await saveAndPublish(content);
}

一致性

AI 会对所有内容统一应用相同规则，不会疲劳，也不会因审核人员不同而产生差异。决策是可复现的：在相同项目阈值下，相同输入会得到相同的 hit 标记，这使得执行过程可审计。

审核架构

预审核流程

User Submits → AI Check → Decision
                 ↓
    ┌───────────┼───────────┐
    ↓           ↓           ↓
  Allow      Review       Block
    ↓           ↓           ↓
 Publish   Human Queue   Reject

后审核流程

User Submits → Publish → AI Check → Action
                           ↓
              ┌────────────┼────────────┐
              ↓            ↓            ↓
            Safe       Borderline    Violation
              ↓            ↓            ↓
           Keep       Flag/Review    Remove

混合方案（推荐）

一旦任何已启用的检查项超过其配置的阈值，Discuse 响应就会设置 has_violations，并公开各个类别的底层分数，这样你就可以在其基础上添加自己的置信度区间：

async function moderateContent(content) {
  const result = await checkContent(content);

  // Build a confidence figure from the scores you care about.
  // e.g. the toxicity score and the spam classifier confidence.
  const confidence = Math.max(
    result.results?.sentiment?.toxicity ?? 0,
    result.results?.spamfinder?.confidence ?? 0,
    result.results?.images?.porn ?? 0
  );

  // High confidence: automate.
  if (confidence > 0.95) {
    return result.has_violations
      ? { action: 'auto_remove', reason: result.message }
      : { action: 'auto_approve' };
  }

  // Medium confidence: route to a human.
  if (confidence > 0.5) {
    await addToReviewQueue(content, result);
    return { action: 'pending_review' };
  }

  // Low confidence: approve, keep watching.
  return { action: 'approve_with_monitoring' };
}

实施 AI 审核

第 1 步：定义你的策略

在调用 API 之前，先决定哪些内容可以接受，以及对每个类别应采取什么操作。在 Discuse 中，每个类别的数值阈值都位于你的项目设置（仪表盘）中，因此你的应用策略是将 API 结果映射到某个操作，而不是重新判断阈值：

const MODERATION_POLICY = {
  // What to do when a given Discuse check reports a hit.
  // Thresholds themselves are configured per project in the dashboard.
  actions: {
    sentiment: 'block',   // toxic / threatening text
    spam: 'block',
    badwords: 'flag',
    images: 'block',      // explicit imagery
    links: 'flag'
  }
};

第 2 步：集成 API

发送内容以及你希望为本次请求启用的检查项。每个 check_* 开关都是可选布尔值，用于覆盖本次调用的项目默认设置：

async function checkContent(content) {
  const response = await fetch('https://api.discuse.com/api/v2/check', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': process.env.DISCUSE_API_KEY
    },
    body: JSON.stringify({
      content: {
        text: content.text,
        image_urls: content.images
      },
      settings: {
        check_sentiment: true,
        check_spam: true,
        check_images: true
      }
    })
  });

  if (!response.ok) {
    throw new Error(`Discuse API returned ${response.status}`);
  }
  return response.json();
}

第 3 步：应用决策

将每个报告 hit 的检查项映射到你定义的操作：

function applyModerationDecision(result) {
  const r = result.results || {};

  if (r.sentiment?.hit) return { action: MODERATION_POLICY.actions.sentiment, category: 'sentiment' };
  if (r.spamfinder?.hit) return { action: MODERATION_POLICY.actions.spam, category: 'spam' };
  if (r.images?.hit)    return { action: MODERATION_POLICY.actions.images, category: 'images' };
  if (r.badwords?.hit)  return { action: MODERATION_POLICY.actions.badwords, category: 'badwords' };
  if (r.links?.hit)     return { action: MODERATION_POLICY.actions.links, category: 'links' };

  return { action: 'allow' };
}

第 4 步：处理边缘情况

async function handleModerationResult(content, result) {
  switch (result.action) {
    case 'block':
      await notifyUser(content.author, 'content_blocked', result);
      await logModeration(content, result);
      return false;

    case 'flag':
      await addToReviewQueue(content, result);
      await publishWithWarning(content);
      return true;

    case 'allow':
      await publish(content);
      return true;

    default:
      // Unknown action - fail safe by blocking
      await logError('unknown_moderation_action', result);
      return false;
  }
}

最佳实践

从保守策略开始，随着时间逐步调整

一开始为项目设置更严格的阈值，然后在衡量误判情况后再逐步放宽。在 Discuse 中，这些阈值属于项目设置，因此调优是在控制台中进行（或通过设置更新 API 进行），而不是在每次请求中单独设置。工作流程请参见阈值配置指南。

保留人工审核队列

AI 应该辅助人工判断，而不是在边缘案例中取代人工判断：

async function processReviewQueue() {
  const items = await getReviewQueue();

  for (const item of items) {
    // Present to human reviewer with AI context
    const reviewUI = {
      content: item.content,
      ai_scores: item.moderation_result,
      similar_decisions: await getSimilarPreviousDecisions(item)
    };

    // Human makes final decision
    const decision = await presentToReviewer(reviewUI);

    // Log for model improvement
    await logHumanDecision(item, decision);
  }
}

监控并改进

跟踪关键指标，以改进你的审核系统：

const METRICS = {
  // Accuracy metrics
  false_positive_rate: 'Content incorrectly blocked',
  false_negative_rate: 'Harmful content missed',

  // Operational metrics
  average_response_time: 'API latency',
  review_queue_depth: 'Human review backlog',

  // User impact
  appeal_rate: 'Users appealing decisions',
  appeal_success_rate: 'Appeals overturned'
};

妥善处理申诉

当用户提出申诉时，应将该内容交给人工审核员处理，而不是自动重新判定。向审核员提供原始 Discuse 评分以及用户历史记录作为上下文：

async function handleAppeal(contentId, userId) {
  const original = await getContentWithModeration(contentId);

  await addToReviewQueue(contentId, {
    type: 'appeal',
    original_decision: original.moderation, // Discuse `results` saved at decision time
    author_history: await getAuthorHistory(userId)
  });

  return { status: 'pending', message: 'Under review' };
}

API 本身没有按请求传入的“context”或“author history”参数——上下文需要在你这边用于选择阈值和决定是否进入审核流程。

常见陷阱

过度依赖 AI

把每一个决策都自动化，意味着也会把每一个错误自动化。在以下情况下，请保留人工介入：

复杂的上下文决策
高风险内容（法律、安全）
申诉和边缘案例

忽视上下文

同样的话语是否有害或可接受，取决于上下文：

"I'm going to kill it at this interview!" // Positive
"I'm going to kill you"                    // Threat

Discuse 会对每条消息单独评分；它没有请求级别的“上下文”参数。请在你这一侧应用上下文：根据不同场景（公开帖子与私信）选择更严格或更宽松的项目阈值，并将边界情况的 hit 转交给人工审核。

一次设置后就不再维护

内容审核需要持续调优：

监控误报/漏报率
根据数据更新阈值
审查新的内容模式
重新训练或更新模型

执行不一致

按规则而不是按发布者来执行政策。应基于有文档记录的信任级别来驱动阈值，而不是使用临时例外：

// Avoid: per-person exceptions
if (user.isInfluencer) { /* lenient */ }

// Prefer: thresholds keyed to a documented trust level,
// configured the same way for everyone in that level.
const action = MODERATION_POLICY.actions[category];

后续步骤

配置阈值 - 精细调整你的审核策略
扩展内容审核规模 - 应对高流量
文本分析 - 深入了解文本审核
图片 NSFW 检测 - 保护视觉内容