Skip to content

Benchmarking notebooks for various Persian G2P models, comparing their performance on the SentenceBench dataset, including Homo-GE2PE and Homo-T5.

License

Notifications You must be signed in to change notification settings

MahtaFetrat/Persian-G2P-Tools-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Persian G2P Tools Benchmark

This repository contains benchmarking notebooks for various Persian grapheme-to-phoneme (G2P) models, including both baseline models and the proposed Homo-GE2PE and Homo-T5 models in the Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models study. The benchmarks are conducted using the SentenceBench Persian G2P Benchmark.


Repository Structure

benchmarking-scripts/
│   ├── Benchmark_AzamRabiee_Persian_G2P.ipynb
│   ├── Benchmark_GE2PE.ipynb
│   ├── Benchmark_HomoFast_eSpeak.ipynb
│   ├── Benchmark_Homo_GE2PE.ipynb
│   ├── Benchmark_Homo_T5.ipynb
│   ├── Benchmark_PasaOpasen_PersianG2P.ipynb
│   ├── Benchmark_de_mh_persian_phonemizer.ipynb
│   ├── Benchmark_dmort27_epitran.ipynb
│   ├── Benchmark_eSpeak_NG.ipynb
│   └── Benchmark_mohamad_hasan_sohan_ajini_G2P.ipynb
│   └── Benchmark_sajadalipour7_Persian_Grapheme_To_Phoneme_With_Transformer.ipynb

Each notebook benchmarks a specific model using the SentenceBench dataset. The results of each run (5 independent runs per model) are documented in the last markdown cell of each notebook.


Benchmarking Results

The table below presents the performance of each model, averaged across 5 runs:

Model PER (%) Homograph Acc. (%) Avg. Inf. Time (s)
Persian_G2P 35.23 21.23 11.1374
PersianG2P 15.04 37.74 2.1686
persian_phonemizer 25.27 29.25 0.1803
Epitran 45.12 0.00 0.0003
Persian G2P 19.63 29.91 28.0039
Persian_Grapheme_To_Phoneme 12.85 40.00 0.9685
eSpeak NG 6.92 43.87 0.0169
GE2PE 4.81 47.17 0.4464
HomoFast eSpeak 6.33 74.53 0.0084
Homo-T5 4.12 76.32 0.4141
Homo-GE2PE 3.98 76.89 0.4473

Contributions

Contributions and pull requests are welcome. Please open an issue to discuss the changes you intend to make or the models/ttols you want to add to the benchmark.


Additional Links

About

Benchmarking notebooks for various Persian G2P models, comparing their performance on the SentenceBench dataset, including Homo-GE2PE and Homo-T5.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published