OCR Text Extraction
Discuse pulls the text out of images and documents so you can read — and moderate — content that would otherwise be invisible to a text filter. Send up to 5 image or document URLs to POST /api/v2/ocr and you get back the recognized text, and, by default, that text run through your project's content checks.
Why OCR for moderation?
Plenty of abuse hides inside images: a slur baked into a meme, a phishing link in a screenshot, a scam phone number on a flyer. A plain text check never sees it. OCR extracts the words first, so the same sentiment, spam, badword, and language checks you already run on text apply to image and document content too.
How do I extract text?
Send one or more file URLs. moderate defaults to true, so the extracted text is also checked and you get a results object back; set it to false if you only want the raw text.
curl -X POST https://api.discuse.com/api/v2/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"file_urls": ["https://example.com/user-meme.jpg"],
"moderate": true
}'
A single request accepts up to 5 image or document URLs.
Response
{
"text": "BUY FOLLOWERS NOW — dm @spammer for 50% off",
"has_text": true,
"num_files": 1,
"has_violations": true,
"results": {
"hits": true,
"spamfinder": {
"label": "spam",
"confidence": 0.94,
"is_spam": true,
"hit": true
}
},
"usage": {
"api_requests_used": 412,
"api_requests_limit": 10000,
"api_requests_remaining": 9588
}
}
When moderate is false (or no text was found), results is omitted and has_violations is false — you just get the extracted text.
Request fields
| Field | Type | Notes |
|---|---|---|
api_key |
string | Optional in body; you can send X-API-Key instead |
file_urls |
string[] | Image or document URLs to read. At least one required, up to 5 |
moderate |
boolean | Run the extracted text through your text checks. Defaults to true |
Response fields
| Field | Type | Description |
|---|---|---|
text |
string | Recognized text, concatenated across all files |
has_text |
boolean | True if any non-empty text was recognized |
num_files |
number | Number of files successfully read |
has_violations |
boolean | True if the moderated text tripped a check |
results |
object | The text-check results (see Text Analysis), present only when moderation ran and text was found |
usage |
object | api_requests_used, api_requests_limit, api_requests_remaining |
The results object has the same shape as POST /api/v2/check — spamfinder, sentiment, language, badwords, and the top-level hits flag. See Text Analysis for the field details.
Usage limits
OCR is a paid-plan feature; each file you extract counts once against your OCR quota.
| Plan | Monthly OCR Extractions | Overage Rate |
|---|---|---|
| Basic | Not available | - |
| Gold | 1,000 | $0.0015/extraction |
| Platinum | 2,000 | $0.001275/extraction (15% discount) |
| Ultimate | 4,000 | $0.001125/extraction (25% discount) |
If a project has no active subscription, OCR requests are denied.
Best practices
Moderate in one call
Leave moderate on (the default) when your goal is to catch policy violations in images. One OCR call both extracts the text and checks it, instead of an OCR call followed by a separate /check call.
async function moderateImage(fileUrl) {
const res = await ocr([fileUrl], true);
if (res.has_violations) {
await flagForReview(fileUrl, res.results);
}
return res.text;
}
Check has_text before acting
An image with no readable text returns has_text: false and an empty text. Branch on it so you don't treat "nothing to read" as "clean and confirmed".
Batch related files
If a submission carries several images, send them together (up to 5) in one request rather than one call per file — fewer round trips, one quota-tracked response.
Integration examples
Node.js
async function ocr(fileUrls, moderate = true) {
const response = await fetch('https://api.discuse.com/api/v2/ocr', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': process.env.DISCUSE_API_KEY
},
body: JSON.stringify({ file_urls: fileUrls, moderate })
});
return response.json();
}
Python
import os
import requests
def ocr(file_urls, moderate=True):
response = requests.post(
'https://api.discuse.com/api/v2/ocr',
headers={
'Content-Type': 'application/json',
'X-API-Key': os.environ['DISCUSE_API_KEY']
},
json={'file_urls': file_urls, 'moderate': moderate}
)
return response.json()
Ready to read text from images? Get started with Discuse.