-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Description
Current Behavior
Tesseract stucks (hangs) when processing specific file with with page segmentation mode: "SparseText".
Command:
"C:\Program Files\Tesseract-OCR\tesseract.exe" page.png page --psm 11 -l eng
Test File:
https://drive.google.com/file/d/1zg0nHfiudF89PbER12U1Gi3lQzBcyoiw/view?usp=sharing
For reference reasons I am attaching original PDF (cause my goal is to do PDF OCR):
53402_01040415_doc002.pdf
Expected Behavior
Processing should perform fast like with segmentation mode: "Auto".
Suggested Fix
No response
tesseract -v
5.5.0.20241111
Operating System
Windows 11
Other Operating System
No response
uname -a
No response
Compiler
tesseract --version:
tesseract v5.5.0.20241111
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.11.0 Schannel zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0
CPU
intel i7-13700H
Virtualization / Containers
No response
Other Information
No response