Mengonfigurasi Ambang Deteksi

Ambang deteksi menentukan tingkat keyakinan yang digunakan Discuse untuk menandai konten, dengan menyeimbangkan false positive dan false negative. Di Discuse, ambang ini adalah pengaturan proyek yang dikonfigurasi di dashboard (atau settings API), dan permintaan API hanya mengaktifkan pemeriksaan mana yang dijalankan — permintaan tersebut tidak memuat ambang numerik. Panduan ini menjelaskan cara kerja trade-off tersebut dan cara memilih nilai yang tepat untuk platform Anda.

Cara kerja ambang batas Discuse

Setiap pemeriksaan menghasilkan skor keyakinan dari 0.0 hingga 1.0 dan flag hit. hit akan diaktifkan ketika skor memenuhi atau melampaui ambang batas yang Anda konfigurasi untuk kategori tersebut di proyek Anda. Ambang batas yang dapat dikonfigurasi adalah:

Ambang batas (pengaturan proyek)	Berlaku untuk
`threshold_sentiment`	Batas sentimen negatif secara keseluruhan
`threshold_toxicity`	Teks toksik
`threshold_profanity`	Kata-kata kasar
`threshold_threat`	Ancaman
`threshold_insult`	Hinaan
`threshold_spam`	Keyakinan pengklasifikasi spam
`threshold_images`	Gambar eksplisit (keseluruhan)
`threshold_images_porn`	Gambar pornografis
`threshold_images_sexual`	Gambar bernuansa seksual

Nama-nama ini adalah yang ditampilkan oleh API pengaturan proyek; objek settings per permintaan hanya berisi toggle aktif/nonaktif check_* plus expected_language. Anda mengubah ambang batas di dasbor, bukan per permintaan.

Komprominya

Lower Threshold = More Strict
├── More content flagged
├── Higher false positive rate
├── Fewer harmful posts slip through
└── More user friction

Higher Threshold = More Permissive
├── Less content flagged
├── Lower false positive rate
├── More harmful posts may slip through
└── Better user experience

Memvisualisasikan komprominya

                False Positives ─────────────────────────►
                     Few                              Many
               ┌─────────────────────────────────────────┐
False      Few │   ◄─── Ideal Zone                      │
Negatives      │        (High threshold,                │
               │         low false rates)               │
      │        │                                        │
      │        │              Your platform's           │
      │        │              optimal point →  ●        │
      ▼        │                                        │
          Many │                      Too permissive ──►│
               └─────────────────────────────────────────┘
                       Threshold: 0.3   0.5   0.7   0.9

Ambang batas apa yang cocok untuk tiap platform?

Nilai di bawah ini adalah titik awal yang dinyatakan dengan nama ambang batas Discuse (threshold_toxicity, threshold_images_porn, dan seterusnya). Anggap sebagai patokan dasar untuk disesuaikan dengan data false positive Anda sendiri, bukan sebagai angka yang pasti benar. Ambang batas yang lebih rendah akan menandai konten dengan lebih agresif.

Platform media sosial

Platform sosial serbaguna membutuhkan moderasi yang seimbang:

const SOCIAL_MEDIA_THRESHOLDS = {
  threshold_toxicity: 0.7,
  threshold_profanity: 0.6,
  threshold_threat: 0.5,        // Lower (stricter) for threats
  threshold_insult: 0.7,
  threshold_spam: 0.75,
  threshold_images_porn: 0.6,
  threshold_images_sexual: 0.8  // More permissive for suggestive content
};

Platform profesional / bisnis

Konteks bisnis biasanya membutuhkan moderasi yang lebih ketat:

const PROFESSIONAL_THRESHOLDS = {
  threshold_toxicity: 0.5,
  threshold_profanity: 0.4,
  threshold_threat: 0.3,
  threshold_insult: 0.5,
  threshold_spam: 0.6,
  threshold_images_porn: 0.3,
  threshold_images_sexual: 0.5
};

Komunitas gaming

Platform gaming mungkin lebih menoleransi candaan, tetapi tetap ketat terhadap ancaman nyata:

const GAMING_THRESHOLDS = {
  threshold_toxicity: 0.8,
  threshold_profanity: 0.85,    // Banter allowed
  threshold_threat: 0.5,        // Still strict on real threats
  threshold_insult: 0.8,
  threshold_spam: 0.8,
  threshold_images_porn: 0.6,
  threshold_images_sexual: 0.9
};

Platform anak-anak

Platform untuk anak di bawah umur memerlukan pengaturan yang paling ketat:

const CHILDRENS_THRESHOLDS = {
  threshold_toxicity: 0.3,
  threshold_profanity: 0.2,
  threshold_threat: 0.2,
  threshold_insult: 0.3,
  threshold_spam: 0.5,
  threshold_images_porn: 0.1,   // Maximum strictness
  threshold_images_sexual: 0.2
};

Dapatkah saya mengubah ambang batas secara dinamis?

Discuse menyimpan satu set ambang batas untuk setiap proyek, sehingga variasi per pengguna atau per konteks berada di aplikasi Anda. Pola di bawah ini menghitung ambang batas efektif di sisi Anda; lalu Anda dapat mengarahkan ke proyek yang berbeda (masing-masing dengan ambang batas yang dikonfigurasi sendiri) atau menerapkan perbandingan sendiri terhadap skor yang dikembalikan Discuse.

Tingkat kepercayaan pengguna

Sesuaikan ambang batas efektif berdasarkan reputasi pengguna:

function getThresholds(user) {
  const baseThresholds = PLATFORM_THRESHOLDS;

  const trustMultipliers = {
    new_user: 0.8,       // Stricter (lower effective threshold)
    basic_user: 1.0,     // Standard
    verified_user: 1.15, // Slightly more permissive
    trusted_user: 1.3,   // More permissive
    moderator: 1.5       // Most permissive
  };

  const multiplier = trustMultipliers[user.trustLevel] || 1.0;

  return Object.fromEntries(
    Object.entries(baseThresholds).map(([key, value]) => [
      key,
      typeof value === 'number'
        ? Math.min(value * multiplier, 0.95)
        : adjustNestedThresholds(value, multiplier)
    ])
  );
}

Ambang batas berbasis konteks

Jenis konten yang berbeda mungkin memerlukan ambang batas yang berbeda:

const CONTEXT_THRESHOLDS = {
  // Public posts visible to everyone
  public_post: {
    toxic: 0.6,
    profanity: 0.5
  },

  // Direct messages between users
  direct_message: {
    toxic: 0.7,      // Slightly more permissive
    profanity: 0.6
  },

  // Comments on public content
  comment: {
    toxic: 0.55,     // Stricter than posts
    profanity: 0.5
  },

  // Profile information
  profile: {
    toxic: 0.5,      // Strict for public-facing content
    profanity: 0.4
  }
};

function getContextThresholds(contentType) {
  return CONTEXT_THRESHOLDS[contentType] || CONTEXT_THRESHOLDS.public_post;
}

Penyesuaian berbasis waktu

Sesuaikan ambang batas selama periode berisiko tinggi:

function getTimeAdjustedThresholds(baseThresholds) {
  const hour = new Date().getHours();
  const dayOfWeek = new Date().getDay();

  // Stricter during off-hours when fewer moderators available
  const isOffHours = hour < 6 || hour > 22;
  const isWeekend = dayOfWeek === 0 || dayOfWeek === 6;

  let multiplier = 1.0;

  if (isOffHours) multiplier *= 0.85;
  if (isWeekend) multiplier *= 0.9;

  return adjustThresholds(baseThresholds, multiplier);
}

Menerapkan konfigurasi ambang batas

Konfigurasi terpusat

// config/moderation.js — mirrors your Discuse project thresholds so app-side
// routing stays in sync with the values configured in the dashboard.
export const ModerationConfig = {
  thresholds: {
    threshold_toxicity:      parseFloat(process.env.THRESHOLD_TOXICITY || '0.7'),
    threshold_profanity:     parseFloat(process.env.THRESHOLD_PROFANITY || '0.6'),
    threshold_threat:        parseFloat(process.env.THRESHOLD_THREAT || '0.5'),
    threshold_insult:        parseFloat(process.env.THRESHOLD_INSULT || '0.7'),
    threshold_spam:          parseFloat(process.env.THRESHOLD_SPAM || '0.75'),
    threshold_images_porn:   parseFloat(process.env.THRESHOLD_IMAGES_PORN || '0.6'),
    threshold_images_sexual: parseFloat(process.env.THRESHOLD_IMAGES_SEXUAL || '0.8')
  },

  actions: {
    high_confidence: 'auto_block',     // score > 0.95
    medium_confidence: 'human_review', // 0.7 - 0.95
    low_confidence: 'allow_with_flag'  // threshold - 0.7
  }
};

Pembaruan ambang batas saat runtime

Izinkan penyesuaian ambang batas tanpa perlu deployment ulang:

class ModerationService {
  constructor() {
    this.thresholds = defaultThresholds;
    this.loadRemoteConfig();
  }

  async loadRemoteConfig() {
    try {
      const config = await fetch('/api/admin/moderation-config');
      const data = await config.json();
      this.thresholds = data.thresholds;
      console.log('Loaded remote moderation config');
    } catch (error) {
      console.warn('Using default thresholds:', error);
    }
  }

  async checkContent(content, context) {
    const result = await callModerationAPI(content);
    const thresholds = this.getThresholdsForContext(context);

    return this.applyThresholds(result, thresholds);
  }
}

Mengukur efektivitas ambang batas

Metrik utama

const MODERATION_METRICS = {
  // Accuracy
  precision: 'True positives / (True positives + False positives)',
  recall: 'True positives / (True positives + False negatives)',
  f1_score: 'Harmonic mean of precision and recall',

  // User impact
  block_rate: 'Content blocked / Total content',
  appeal_rate: 'Appeals filed / Content blocked',
  appeal_success: 'Appeals won / Appeals filed',

  // Operational
  review_queue_size: 'Items waiting for human review',
  review_time: 'Average time to human decision'
};

Ambang batas pengujian A/B

Uji perubahan ambang batas pada sebagian traffic:

async function moderateWithExperiment(content, userId) {
  const experiment = getExperiment(userId, 'threshold_test');

  const thresholds = experiment === 'control'
    ? CURRENT_THRESHOLDS
    : EXPERIMENTAL_THRESHOLDS;

  const result = await checkContent(content);
  const decision = applyThresholds(result, thresholds);

  // Log for analysis
  await logExperiment({
    experiment: 'threshold_test',
    variant: experiment,
    content_id: content.id,
    scores: result,
    decision: decision,
    timestamp: Date.now()
  });

  return decision;
}

Menganalisis hasil

-- Calculate precision and recall for each threshold variant
SELECT
  variant,
  COUNT(*) as total_decisions,
  SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) as true_positives,
  SUM(CASE WHEN blocked AND NOT actually_harmful THEN 1 ELSE 0 END) as false_positives,
  SUM(CASE WHEN NOT blocked AND actually_harmful THEN 1 ELSE 0 END) as false_negatives,
  SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
    NULLIF(SUM(CASE WHEN blocked THEN 1 ELSE 0 END), 0) as precision,
  SUM(CASE WHEN blocked AND actually_harmful THEN 1 ELSE 0 END) * 1.0 /
    NULLIF(SUM(CASE WHEN actually_harmful THEN 1 ELSE 0 END), 0) as recall
FROM moderation_decisions
WHERE experiment = 'threshold_test'
GROUP BY variant;

Alur kerja penyesuaian ambang batas

Langkah 1: Tetapkan baseline

// Start with conservative thresholds
const INITIAL_THRESHOLDS = {
  threshold_toxicity: 0.5,
  threshold_profanity: 0.5,
  threshold_spam: 0.6
};

Langkah 2: Kumpulkan data

async function logModerationDecision(content, result, decision) {
  await db.insert('moderation_log', {
    content_id: content.id,
    content_hash: hashContent(content.text),
    scores: result.results,
    thresholds_used: currentThresholds,
    decision: decision,
    user_trust_level: content.author.trustLevel,
    created_at: Date.now()
  });
}

Langkah 3: Analisis tingkat kesalahan

Tinjau konten yang diblokir dan banding pengguna untuk mengidentifikasi:

Positif palsu: konten aman yang keliru diblokir
Negatif palsu: konten berbahaya yang tidak terdeteksi

Langkah 4: Sesuaikan dan ulangi

// Based on analysis, raise thresholds that fire too often
const ADJUSTED_THRESHOLDS = {
  threshold_toxicity: 0.65,  // Raised after false positives
  threshold_profanity: 0.55,
  threshold_spam: 0.7
};

Langkah 5: Pantau secara terus-menerus

Siapkan peringatan untuk efektivitas ambang batas:

async function checkModerationHealth() {
  const stats = await getModerationStats(last24Hours);

  // Alert if false positive rate too high
  if (stats.appealSuccessRate > 0.3) {
    alert('High appeal success rate - thresholds may be too strict');
  }

  // Alert if harmful content is getting through
  if (stats.reportedAfterApproval > threshold) {
    alert('Increase in reported content - thresholds may be too permissive');
  }
}

Ringkasan praktik terbaik

Mulai secara konservatif: awali dengan ambang batas yang lebih ketat, lalu longgarkan berdasarkan data.
Gunakan konteks: area yang berbeda (postingan publik, DM, profil) memerlukan ambang batas yang berbeda.
Tingkat kepercayaan itu penting: sesuaikan ambang batas efektif berdasarkan reputasi pengguna di aplikasi Anda.
Ukur semuanya: pantau presisi, recall, dan dampaknya terhadap pengguna.
Iterasi secara berkelanjutan: moderasi tidak pernah benar-benar "selesai".
Dokumentasikan keputusan: simpan catatan tentang alasan ambang batas diubah.
Siapkan fallback: arahkan kasus yang berada di batas abu-abu untuk ditinjau manusia.

Ingat: di Discuse, ambang batas ini adalah pengaturan proyek. Ubah melalui dasbor atau lewat API pengaturan; objek settings per permintaan hanya mengaktifkan atau menonaktifkan check_* yang dijalankan.

Langkah berikutnya

Panduan Moderasi Konten AI - Memahami moderasi AI
Menskalakan Moderasi Konten - Implementasi bervolume tinggi
Analisis Teks - Detail moderasi khusus teks