Project DeepSpeech

DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

NOTE: This documentation applies to the master branch of DeepSpeech only. If you're using a stable release, you must use the documentation for the corresponding version by using GitHub's branch switcher button above.

To install and use deepspeech all you have to do is:

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate

# Install DeepSpeech
pip3 install deepspeech

# Download pre-trained English model and extract
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
tar xvf deepspeech-0.6.0-models.tar.gz

# Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz
tar xvf audio-0.6.0.tar.gz

# Transcribe an audio file
deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav

A pre-trained English model is available for use and can be downloaded using the instructions below. Currently, only 16-bit, 16 kHz, mono-channel WAVE audio files are supported in the Python client. A package with some example audio files is available for download in our release notes.

Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the release notes to find which GPUs are supported. To run deepspeech on a GPU, install the GPU specific package:

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate

# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu

# Transcribe an audio file.
deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav

Please ensure you have the required CUDA dependencies.

See the output of deepspeech -h for more information on the use of deepspeech. (If you experience problems running deepspeech, please check required runtime dependencies).

Table of Contents

Name	Latest commit message	Commit time
Failed to load latest commit information.
.github	Add lock bot config	Dec 28, 2018
bin	Reduce training task from 399 epochs to 220, enough to overfit LDC93S1	Dec 3, 2019
data	Switch to --prune 0 0 1 model and move generation code to a script	Nov 15, 2019
doc	Merge pull request #2548 from carlfm01/net-streams	Dec 6, 2019
examples	Merge pull request #2548 from carlfm01/net-streams	Dec 6, 2019
images	Updating Geometry	Dec 2, 2019
native_client	Merge pull request #2548 from carlfm01/net-streams	Dec 6, 2019
taskcluster	Drop support for Python 3.4 as it is EOL and no longer builds on macOS	Dec 3, 2019
util	Error early if audio sample rate and feature window/step length are i…	Dec 4, 2019
.cardboardlint.yml	Update cardboardlint configuration	Oct 4, 2019
.compute	Bump references to TF 1.13.1 to TF 1.14.0	Jul 8, 2019
.gitattributes	Remove old versions of decoder binary files	Nov 8, 2018
.gitignore	Sphinx doc	Sep 24, 2019
.pylintrc	Remove alphabet param usage	Nov 5, 2019
.readthedocs.yml	Re-enable readthedocs.io	Sep 24, 2019
.taskcluster.yml	Move to TC Community	Nov 5, 2019
.travis.yml	Add pylint CI	Apr 11, 2019
CODE_OF_CONDUCT.md	Add Mozilla Code of Conduct file	Mar 29, 2019
CONTRIBUTING.rst	Move from Markdown to reStructuredText	Oct 4, 2019
DeepSpeech.py	Disable caching features to memory	Nov 29, 2019
Dockerfile	Update Dockerfile	Oct 23, 2019
GRAPH_VERSION	Embed alphabet directly in model	Nov 5, 2019
ISSUE_TEMPLATE.md	Create an issue template	Nov 27, 2017
LICENSE	Added LICENSE	Sep 20, 2016
README.rst	Update example links in README	Dec 5, 2019
RELEASE.rst	Move from Markdown to reStructuredText	Oct 4, 2019
SUPPORT.rst	Move from Markdown to reStructuredText	Oct 4, 2019
TRAINING.rst	adding amp doc	Oct 27, 2019
USING.rst	Update docs to refer to v0.6.0 release links	Dec 3, 2019
VERSION	Bump version to v0.6.0	Dec 3, 2019
bazel.patch	Proper re-use of Bazel cache	Jan 31, 2018
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR	Move to ARMbian Buster	Aug 21, 2019
evaluate.py	Merge pull request #2435 from mozilla/uplift-utf8-fixes	Oct 25, 2019
evaluate_tflite.py	Remove alphabet param usage	Nov 5, 2019
requirements.txt	Tool for bulk transcription	Nov 18, 2019
requirements_eval_tflite.txt	Add TFLite accuracy estimation tool	Feb 12, 2019
stats.py	Computing audio hours at import	May 28, 2019
transcribe.py	Separate process per file; less log noise	Nov 20, 2019

mozilla/DeepSpeech

Join GitHub today

Clone with HTTPS

Downloading

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

README.rst

Project DeepSpeech