Skip to content
This repository has been archived by the owner. It is now read-only.

Commit 710d283

Browse files
christian-oreillypafonta
authored andcommitted
Fixing the OCR on server-side.
For some reasons, the behavior of ocrmypdf seem to have change. Whereas before we were expecting directly the .txt file from it, now it was generating a PDF with the ocr-ed text overlaid to it. This commit fix this issue by overwriting the original scan PDF with a pdf with text overlaid and run the usual pdftotext on this new PDF.
1 parent 5961509 commit 710d283

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

nat/restServer.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,8 @@ def runOCR(fileName):
5353
app.OCRLock.release()
5454

5555
# Run OCR
56-
run_ocrmypdf(fileName + ".pdf", fileName + ".txt")
56+
run_ocrmypdf(fileName + ".pdf", fileName + ".pdf")
57+
check_call(['pdftotext', '-enc', 'UTF-8', fileName + ".pdf", fileName + ".txt"])
5758

5859
acquireLockWithTimeout()
5960
del app.OCRFiles[app.OCRFiles.index(fileName)]

0 commit comments

Comments
 (0)