Hi there! This repo shows how to use openvion to accelerate GOT-OCR2.0 model.
- Download all files from the origin repo on huggingface, then move all files to the weight folder. The file structure will eventually look like this:
.
│ app.py
│ convert_model.py
├─ weight
│ config.json
│ generation_config.json
│ got_vision_b.py
│ modeling_GOT.py
│ qwen.tiktoken
│ render_tools.py
│ special_tokens_map.json
│ tokenization_qwen.json
│ tokenizer_config.json
- Run the following command
python app.py --image-file /path/to/image
It will automatically convert the model into OpenVINO IR using INT4 quantization. For more information about quantization with OpenVINO, please refer to nncf.
-
Original version generates 19 Token/s, while OV with INT4 quantiztion speed up to 37 Token/s (Only test on Intel i7-1360P, 16GB, Windows 11 Pro).
-
Accuracy has not been tested yet, but it seems good to me.
-
Some code is generated from ov_qwen2_audio_helper.py.
GOT-OCR2.0: Towards OCR-2.0 via a Unified End-to-end Model
OpenVINO: Open-source software toolkit for optimizing and deploying deep learning models.