Project DeepSpeech

DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

NOTE: This documentation applies to the MASTER version of DeepSpeech only. Documentation for the latest stable version is published on deepspeech.readthedocs.io.

To install and use deepspeech all you have to do is:

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate

# Install DeepSpeech
pip3 install deepspeech

# Download pre-trained English model and extract
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-models.tar.gz
tar xvf deepspeech-0.6.1-models.tar.gz

# Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/audio-0.6.1.tar.gz
tar xvf audio-0.6.1.tar.gz

# Transcribe an audio file
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --scorer deepspeech-0.6.1-models/kenlm.scorer --audio audio/2830-3980-0043.wav

A pre-trained English model is available for use and can be downloaded using the instructions below. A package with some example audio files is available for download in our release notes.

Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the release notes to find which GPUs are supported. To run deepspeech on a GPU, install the GPU specific package:

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate

# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu

# Transcribe an audio file.
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --scorer deepspeech-0.6.1-models/kenlm.scorer --audio audio/2830-3980-0043.wav

Please ensure you have the required CUDA dependencies.

See the output of deepspeech -h for more information on the use of deepspeech. (If you experience problems running deepspeech, please check required runtime dependencies).

Table of Contents

Name	Latest commit message	Commit time
Failed to load latest commit information.
.github	Add lock bot config	Dec 28, 2018
bin	M-AILAB importer: Ensure all samples are 16 kHz	Apr 20, 2020
data	Refactor generate_package.py (#2903 )	Apr 17, 2020
doc	rebased docs on master	Apr 17, 2020
examples	Remove example code	Dec 10, 2019
images	Updating Geometry	Dec 2, 2019
native_client	Force ds-swig first in PATH to avoid messing if system-wide exists	Apr 20, 2020
taskcluster	Fix #2928 : Add Python 3.7, 3.8 CI coverage	Apr 20, 2020
tests	Package training code to avoid sys.path hacks	Mar 25, 2020
training/deepspeech_training	Split --load into two to avoid unexpected behavior at evaluation time	Apr 7, 2020
util	Package training code to avoid sys.path hacks	Mar 25, 2020
.cardboardlint.yml	Update cardboardlint configuration	Oct 4, 2019
.compute	Fix .compute for packaged training code	Apr 1, 2020
.gitattributes	Address review comments and update docs	Feb 11, 2020
.gitignore	Package and expose TypeScript for JS interface	Apr 6, 2020
.gitmodules	Use submodule for building contrib examples into docs	Dec 10, 2019
.isort.cfg	Sort importer imports with isort	Mar 31, 2020
.pylintrc	Fix linter errors	Feb 11, 2020
.readthedocs.yml	Re-enable readthedocs.io	Sep 24, 2019
.taskcluster.yml	Use KVM for Android emulator	Feb 26, 2020
.travis.yml	Package training code to avoid sys.path hacks	Mar 25, 2020
BIBLIOGRAPHY.md	Update BIBLIOGRAPHY.md	Feb 21, 2020
CODE_OF_CONDUCT.md	Add Mozilla Code of Conduct file	Mar 29, 2019
CONTRIBUTING.rst	Move from Markdown to reStructuredText	Oct 4, 2019
DeepSpeech.py	Package training code to avoid sys.path hacks	Mar 25, 2020
Dockerfile	Ensure docker build pip really install locally built package	Apr 8, 2020
GRAPH_VERSION	Bump graph version	Jan 24, 2020
ISSUE_TEMPLATE.md	Create an issue template	Nov 27, 2017
LICENSE	Added LICENSE	Sep 20, 2016
README.rst	Make readthedocs link more obvious	Mar 12, 2020
RELEASE.rst	Move from Markdown to reStructuredText	Oct 4, 2019
SUPPORT.rst	Point people to Matrix room instead of IRC	Feb 11, 2020
VERSION	Bump VERSION to 0.7.0-alpha.3	Mar 25, 2020
bazel.patch	Proper re-use of Bazel cache	Jan 31, 2018
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR	Move to ARMbian Buster	Aug 21, 2019
evaluate.py	Package training code to avoid sys.path hacks	Mar 25, 2020
evaluate_tflite.py	Package training code to avoid sys.path hacks	Mar 25, 2020
lm_optimizer.py	Merge pull request #2826 from TeHikuMedia/add_trial_pruning	Apr 1, 2020
requirements_eval_tflite.txt	Update evaluate_tflite requirements	Jan 12, 2020
requirements_tests.txt	Converting importers from multiprocessing.dummy to multiprocessing	Mar 18, 2020
requirements_transcribe.txt	Make webrtcvad really optional	Feb 24, 2020
setup.py	Do not use m/mu ABI for Py3.8+	Apr 20, 2020
stats.py	Package training code to avoid sys.path hacks	Mar 25, 2020
transcribe.py	Split --load into two to avoid unexpected behavior at evaluation time	Apr 7, 2020

mozilla / DeepSpeech

Join GitHub today

Clone with HTTPS

Downloading

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Files

README.rst

Project DeepSpeech