Skip to content

Page orientation not detected properly (low confidence) #4409

@nacholibre

Description

@nacholibre

Current Behavior

Input image

Image

Orientation detection command:

$ tesseract image.png stdout --psm 0 -c min_characters_to_try=20
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 0.03
Script: Latin
Script confidence: 0.00

The script should be Cyrillic, but it's detected as Latin. The confidence is quite low - 0.03. It doesn't work without the min_characters_to_try=20 argument possibly because it cannot correctly identify the script characters.

This example seems like pretty straight forward case of rotation detection but yet it fails.

Am I using the command correctly? Can I add the language as an argument to help tesseract better understand what language is the text?

Expected Behavior

I'm expecting page rotation to be identified with higher confidence.

Suggested Fix

No response

tesseract -v

tesseract 5.5.0
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.47 : libtiff 4.7.0 : zlib 1.2.12 : libwebp 1.5.0 : libopenjp2 2.5.3
Found NEON
Found libarchive 3.7.7 zlib/1.2.12 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.7.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.63.0

Operating System

macOS 14 Sonoma

Other Operating System

No response

uname -a

Darwin MBP.local 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000 arm64

Compiler

No response

CPU

Apple M1 Pro

Virtualization / Containers

No response

Other Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    OSDOrientation and Script Detection

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions