Skip to content
A TensorFlow implementation of Baidu's DeepSpeech architecture
C++ Python C Shell C# Java Other
Branch: master
Clone or download
Latest commit a9fff3f Dec 6, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Add lock bot config Dec 28, 2018
bin Reduce training task from 399 epochs to 220, enough to overfit LDC93S1 Dec 3, 2019
data Switch to --prune 0 0 1 model and move generation code to a script Nov 15, 2019
doc Merge pull request #2548 from carlfm01/net-streams Dec 6, 2019
examples Merge pull request #2548 from carlfm01/net-streams Dec 6, 2019
images Updating Geometry Dec 2, 2019
native_client Merge pull request #2548 from carlfm01/net-streams Dec 6, 2019
taskcluster Drop support for Python 3.4 as it is EOL and no longer builds on macOS Dec 3, 2019
util Error early if audio sample rate and feature window/step length are i… Dec 4, 2019
.cardboardlint.yml Update cardboardlint configuration Oct 4, 2019
.compute Bump references to TF 1.13.1 to TF 1.14.0 Jul 8, 2019
.gitattributes Remove old versions of decoder binary files Nov 8, 2018
.gitignore Sphinx doc Sep 24, 2019
.pylintrc Remove alphabet param usage Nov 5, 2019
.readthedocs.yml Re-enable readthedocs.io Sep 24, 2019
.taskcluster.yml Move to TC Community Nov 5, 2019
.travis.yml Add pylint CI Apr 11, 2019
CODE_OF_CONDUCT.md Add Mozilla Code of Conduct file Mar 29, 2019
CONTRIBUTING.rst Move from Markdown to reStructuredText Oct 4, 2019
DeepSpeech.py Disable caching features to memory Nov 29, 2019
Dockerfile Update Dockerfile Oct 23, 2019
GRAPH_VERSION Embed alphabet directly in model Nov 5, 2019
ISSUE_TEMPLATE.md Create an issue template Nov 27, 2017
LICENSE Added LICENSE Sep 20, 2016
README.rst Update example links in README Dec 5, 2019
RELEASE.rst Move from Markdown to reStructuredText Oct 4, 2019
SUPPORT.rst Move from Markdown to reStructuredText Oct 4, 2019
TRAINING.rst adding amp doc Oct 27, 2019
USING.rst Update docs to refer to v0.6.0 release links Dec 3, 2019
VERSION Bump version to v0.6.0 Dec 3, 2019
bazel.patch Proper re-use of Bazel cache Jan 31, 2018
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR Move to ARMbian Buster Aug 21, 2019
evaluate.py Merge pull request #2435 from mozilla/uplift-utf8-fixes Oct 25, 2019
evaluate_tflite.py Remove alphabet param usage Nov 5, 2019
requirements.txt Tool for bulk transcription Nov 18, 2019
requirements_eval_tflite.txt Add TFLite accuracy estimation tool Feb 12, 2019
stats.py Computing audio hours at import May 28, 2019
transcribe.py Separate process per file; less log noise Nov 20, 2019

README.rst

Project DeepSpeech

Documentation Task Status

DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

NOTE: This documentation applies to the master branch of DeepSpeech only. If you're using a stable release, you must use the documentation for the corresponding version by using GitHub's branch switcher button above.

To install and use deepspeech all you have to do is:

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate

# Install DeepSpeech
pip3 install deepspeech

# Download pre-trained English model and extract
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
tar xvf deepspeech-0.6.0-models.tar.gz

# Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz
tar xvf audio-0.6.0.tar.gz

# Transcribe an audio file
deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav

A pre-trained English model is available for use and can be downloaded using the instructions below. Currently, only 16-bit, 16 kHz, mono-channel WAVE audio files are supported in the Python client. A package with some example audio files is available for download in our release notes.

Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the release notes to find which GPUs are supported. To run deepspeech on a GPU, install the GPU specific package:

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate

# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu

# Transcribe an audio file.
deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav

Please ensure you have the required CUDA dependencies.

See the output of deepspeech -h for more information on the use of deepspeech. (If you experience problems running deepspeech, please check required runtime dependencies).


Table of Contents

You can’t perform that action at this time.