Skip to main content
Documentation
Learning Center

Master content moderation with comprehensive guides, tutorials, and API documentation

Quick Links

OCR Text Extraction

Discuse pulls the text out of images and documents so you can read — and moderate — content that would otherwise be invisible to a text filter. Send up to 5 image or document URLs to POST /api/v2/ocr and you get back the recognized text, and, by default, that text run through your project's content checks.

Why OCR for moderation?

Plenty of abuse hides inside images: a slur baked into a meme, a phishing link in a screenshot, a scam phone number on a flyer. A plain text check never sees it. OCR extracts the words first, so the same sentiment, spam, badword, and language checks you already run on text apply to image and document content too.

How do I extract text?

Send one or more file URLs. moderate defaults to true, so the extracted text is also checked and you get a results object back; set it to false if you only want the raw text.

curl -X POST https://api.discuse.com/api/v2/ocr \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "file_urls": ["https://example.com/user-meme.jpg"],
    "moderate": true
  }'

A single request accepts up to 5 image or document URLs.

Response

{
  "text": "BUY FOLLOWERS NOW — dm @spammer for 50% off",
  "has_text": true,
  "num_files": 1,
  "has_violations": true,
  "results": {
    "hits": true,
    "spamfinder": {
      "label": "spam",
      "confidence": 0.94,
      "is_spam": true,
      "hit": true
    }
  },
  "usage": {
    "api_requests_used": 412,
    "api_requests_limit": 10000,
    "api_requests_remaining": 9588
  }
}

When moderate is false (or no text was found), results is omitted and has_violations is false — you just get the extracted text.

Request fields

Field Type Notes
api_key string Optional in body; you can send X-API-Key instead
file_urls string[] Image or document URLs to read. At least one required, up to 5
moderate boolean Run the extracted text through your text checks. Defaults to true

Response fields

Field Type Description
text string Recognized text, concatenated across all files
has_text boolean True if any non-empty text was recognized
num_files number Number of files successfully read
has_violations boolean True if the moderated text tripped a check
results object The text-check results (see Text Analysis), present only when moderation ran and text was found
usage object api_requests_used, api_requests_limit, api_requests_remaining

The results object has the same shape as POST /api/v2/checkspamfinder, sentiment, language, badwords, and the top-level hits flag. See Text Analysis for the field details.

Usage limits

OCR is a paid-plan feature; each file you extract counts once against your OCR quota.

Plan Monthly OCR Extractions Overage Rate
Basic Not available -
Gold 1,000 $0.0015/extraction
Platinum 2,000 $0.001275/extraction (15% discount)
Ultimate 4,000 $0.001125/extraction (25% discount)

If a project has no active subscription, OCR requests are denied.

Best practices

Moderate in one call

Leave moderate on (the default) when your goal is to catch policy violations in images. One OCR call both extracts the text and checks it, instead of an OCR call followed by a separate /check call.

async function moderateImage(fileUrl) {
  const res = await ocr([fileUrl], true);
  if (res.has_violations) {
    await flagForReview(fileUrl, res.results);
  }
  return res.text;
}

Check has_text before acting

An image with no readable text returns has_text: false and an empty text. Branch on it so you don't treat "nothing to read" as "clean and confirmed".

Batch related files

If a submission carries several images, send them together (up to 5) in one request rather than one call per file — fewer round trips, one quota-tracked response.

Integration examples

Node.js

async function ocr(fileUrls, moderate = true) {
  const response = await fetch('https://api.discuse.com/api/v2/ocr', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': process.env.DISCUSE_API_KEY
    },
    body: JSON.stringify({ file_urls: fileUrls, moderate })
  });
  return response.json();
}

Python

import os
import requests

def ocr(file_urls, moderate=True):
    response = requests.post(
        'https://api.discuse.com/api/v2/ocr',
        headers={
            'Content-Type': 'application/json',
            'X-API-Key': os.environ['DISCUSE_API_KEY']
        },
        json={'file_urls': file_urls, 'moderate': moderate}
    )
    return response.json()

Ready to read text from images? Get started with Discuse.

Written by the Discuse Team · Last updated June 2026

Related Articles

Text Analysis and Sentiment Detection

Detect spam, toxicity, profanity, and analyze sentiment in text content

Image NSFW Detection

Automatically detect and filter inappropriate images and adult content

Spam Detection

AI-powered spam filtering for text and messages