-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Description
Current Behavior
Input image
Orientation detection command:
$ tesseract image.png stdout --psm 0 -c min_characters_to_try=20
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 0.03
Script: Latin
Script confidence: 0.00
The script should be Cyrillic, but it's detected as Latin. The confidence is quite low - 0.03. It doesn't work without the min_characters_to_try=20
argument possibly because it cannot correctly identify the script characters.
This example seems like pretty straight forward case of rotation detection but yet it fails.
Am I using the command correctly? Can I add the language as an argument to help tesseract better understand what language is the text?
Expected Behavior
I'm expecting page rotation to be identified with higher confidence.
Suggested Fix
No response
tesseract -v
tesseract 5.5.0
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.47 : libtiff 4.7.0 : zlib 1.2.12 : libwebp 1.5.0 : libopenjp2 2.5.3
Found NEON
Found libarchive 3.7.7 zlib/1.2.12 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.7.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.63.0
Operating System
macOS 14 Sonoma
Other Operating System
No response
uname -a
Darwin MBP.local 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000 arm64
Compiler
No response
CPU
Apple M1 Pro
Virtualization / Containers
No response
Other Information
No response