Skip to content

Tesseract takes much time when processing image #4430

@paulius-petkus

Description

@paulius-petkus

Current Behavior

Tesseract stucks (hangs) when processing specific file with with page segmentation mode: "SparseText".

Command:

"C:\Program Files\Tesseract-OCR\tesseract.exe" page.png page --psm 11 -l eng

Test File:
https://drive.google.com/file/d/1zg0nHfiudF89PbER12U1Gi3lQzBcyoiw/view?usp=sharing

For reference reasons I am attaching original PDF (cause my goal is to do PDF OCR):
53402_01040415_doc002.pdf

Expected Behavior

Processing should perform fast like with segmentation mode: "Auto".

Suggested Fix

No response

tesseract -v

5.5.0.20241111

Operating System

Windows 11

Other Operating System

No response

uname -a

No response

Compiler

tesseract --version:

tesseract v5.5.0.20241111
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.11.0 Schannel zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0

CPU

intel i7-13700H

Virtualization / Containers

No response

Other Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions