OCR Text Extraction

Discuse pulls the text out of images and documents so you can read — and moderate — content that would otherwise be invisible to a text filter. Send up to 5 image or document URLs to POST /api/v2/ocr and you get back the recognized text, and, by default, that text run through your project's content checks.

Why OCR for moderation?

Plenty of abuse hides inside images: a slur baked into a meme, a phishing link in a screenshot, a scam phone number on a flyer. A plain text check never sees it. OCR extracts the words first, so the same sentiment, spam, badword, and language checks you already run on text apply to image and document content too.

How do I extract text?

Send one or more file URLs. moderate defaults to true, so the extracted text is also checked and you get a results object back; set it to false if you only want the raw text.

curl -X POST https://api.discuse.com/api/v2/ocr \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "file_urls": ["https://example.com/user-meme.jpg"],
    "moderate": true
  }'

A single request accepts up to 5 image or document URLs.

Response

{
  "text": "BUY FOLLOWERS NOW — dm @spammer for 50% off",
  "has_text": true,
  "num_files": 1,
  "has_violations": true,
  "results": {
    "hits": true,
    "spamfinder": {
      "label": "spam",
      "confidence": 0.94,
      "is_spam": true,
      "hit": true
    }
  },
  "usage": {
    "api_requests_used": 412,
    "api_requests_limit": 10000,
    "api_requests_remaining": 9588
  }
}

When moderate is false (or no text was found), results is omitted and has_violations is false — you just get the extracted text.

Request fields

Field	Type	Notes
`api_key`	string	Optional in body; you can send `X-API-Key` instead
`file_urls`	string[]	Image or document URLs to read. At least one required, up to 5
`moderate`	boolean	Run the extracted text through your text checks. Defaults to `true`

Response fields

Field	Type	Description
`text`	string	Recognized text, concatenated across all files
`has_text`	boolean	True if any non-empty text was recognized
`num_files`	number	Number of files successfully read
`has_violations`	boolean	True if the moderated text tripped a check
`results`	object	The text-check results (see Text Analysis), present only when moderation ran and text was found
`usage`	object	`api_requests_used`, `api_requests_limit`, `api_requests_remaining`

The results object has the same shape as POST /api/v2/check — spamfinder, sentiment, language, badwords, and the top-level hits flag. See Text Analysis for the field details.

Usage limits

OCR is a paid-plan feature; each file you extract counts once against your OCR quota.

Plan	Monthly OCR Extractions	Overage Rate
Basic	Not available	-
Gold	1,000	$0.0015/extraction
Platinum	2,000	$0.001275/extraction (15% discount)
Ultimate	4,000	$0.001125/extraction (25% discount)

If a project has no active subscription, OCR requests are denied.

Best practices

Moderate in one call

Leave moderate on (the default) when your goal is to catch policy violations in images. One OCR call both extracts the text and checks it, instead of an OCR call followed by a separate /check call.

async function moderateImage(fileUrl) {
  const res = await ocr([fileUrl], true);
  if (res.has_violations) {
    await flagForReview(fileUrl, res.results);
  }
  return res.text;
}

Check `has_text` before acting

An image with no readable text returns has_text: false and an empty text. Branch on it so you don't treat "nothing to read" as "clean and confirmed".

Batch related files

If a submission carries several images, send them together (up to 5) in one request rather than one call per file — fewer round trips, one quota-tracked response.

Integration examples

Node.js

async function ocr(fileUrls, moderate = true) {
  const response = await fetch('https://api.discuse.com/api/v2/ocr', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': process.env.DISCUSE_API_KEY
    },
    body: JSON.stringify({ file_urls: fileUrls, moderate })
  });
  return response.json();
}

Python

import os
import requests

def ocr(file_urls, moderate=True):
    response = requests.post(
        'https://api.discuse.com/api/v2/ocr',
        headers={
            'Content-Type': 'application/json',
            'X-API-Key': os.environ['DISCUSE_API_KEY']
        },
        json={'file_urls': file_urls, 'moderate': moderate}
    )
    return response.json()

Ready to read text from images? Get started with Discuse.

OCR Text Extraction

Why OCR for moderation?

How do I extract text?

Response

Request fields

Response fields

Usage limits

Best practices

Moderate in one call

Check `has_text` before acting

Batch related files

Integration examples

Node.js

Python

Related Articles

Text Analysis and Sentiment Detection

Image NSFW Detection

Spam Detection

OCR Text Extraction

Why OCR for moderation?

How do I extract text?

Response

Request fields

Response fields

Usage limits

Best practices

Moderate in one call

Check has_text before acting

Batch related files

Integration examples

Node.js

Python

Related Articles

Text Analysis and Sentiment Detection

Image NSFW Detection

Spam Detection

Check `has_text` before acting