asr

目前的多音字使用 pypinyin 或者 g2pM，精度有限，想做一个基于 BERT (或者 ERNIE) 多音字预测模型，简单来说就是假设某语言有 100 个多音字，每个多音字最多有 3 个发音，那么可以在 BERT 后面接 100 个 3 分类器（简单的 fc 层即可），在预测时，找到对应的分类器进行分类即可。
参考论文：
tencent_polyphone.pdf

数据可以用 https://github.com/kakaobrain/g2pM 提供的数据

进阶：多任务的 BERT
![image](https://user-images.githubusercontent.com/24568452

As implemented in Python in

alphacep/vosk-api@5e46825

Creating CSV files manually is a lot of work. This could be automated by a script if the name of the WAV file is the same as the transcript.

The same could be done for creating a language model input text file. A script could pull the transcript from the WAV file name.

Design a logo for LibreASR and share it here.

To make an open source project cool, it should have a logo 😄

For simplified Vosk processing like

https://github.com/Aculeasis/vosk-rest

@upskyy

❓ Questions & Help

Details

Each call of the error rate accumulates the distance and length. Why is that?Is it to have a running average kind of thing?
Why don't you just return the point-wise wer? @upskyy

asr

Here are 500 public repositories matching this topic...

NVIDIA / NeMo

PaddlePaddle / PaddleSpeech

speechbrain / speechbrain

alphacep / vosk-api

wzpan / wukong-robot

xiangyuecn / Recorder

snakers4 / silero-models

tensorflow / lingvo

wenet-e2e / wenet

mravanelli / pytorch-kaldi

Delta-ML / delta

coqui-ai / STT

mravanelli / SincNet

freewym / espresso

pykaldi / pykaldi

srvk / eesen

athena-team / athena

snakers4 / open_stt

iceychris / LibreASR

kaituoxu / Speech-Transformer

hirofumi0810 / neural_sp

alphacep / vosk-server

Picovoice / cheetah

zw76859420 / ASR_Theory

openspeech-team / openspeech

❓ Questions & Help

Details

gooofy / zamia-speech

speechio / chinese_text_normalization

alphacep / vosk-android-demo

lium-lst / nmtpytorch

Ailln / cn2an

Improve this page

Add this topic to your repo