Kaldi+PDNN- Building DNN-based ASR Systems with Kaldi and PDNN
News ----------------------------------------------------------------------------------------------
Nov 2014. A new version is ready. Check the
change log for the list of updates.
Nov 2014. Kaldi+PDNN is moved to
GitHub for better code management and community participation.
Nov 2014. Multi-task Learning is added to
PDNN. This enables DNN training over multiple languages,domains, dialects, etc.
Jul 2014.
SAT for DNNs systems are added.
Apr 2014. A new version is released. Check the
change log for the list of updates.
About ----------------------------------------------------------------------------------------------
Kaldi+PDNN builds state-of-the-art DNN acoustic models using the open-source
Kaldi and
PDNN toolkits. The pipeline has 3stages:
1. Initial GMM models arebuilt with the existing Kaldi recipes
2. DNN/DCN acoustic models aretrained by
PDNN3. Trained DNN/DCN models areported back to Kaldi for decoding or tandem system building
Hightlights of Kaldi+PDNN include:
Modeldiversity.Deep neural networks, deep convolutional networks, bottleneck-feature tandem systems
PDNNtoolkit.Easy to use, fast to implement new ideas. [
more info]
Openlicense.All the code is released under Apache 2.0, the same license as Kaldi
Consistencywith Kaldi. Scripts follow the Kaldi style and can be integrated with any of the existing example setups.
Requirements ----------------------------------------------------------------------------------------------
1. A GPUshould be available on your machine. Otherwise, PDNN will use CPUs.
2. InitialGMM model building should be done with the existing Kaldi recipes
3. Install
Theano. Refer to
the Theanoinstallation for more details. If you are running Ubuntu Linux,
following steps inthis documentwillset up Theano for you.
4. Install pfile_utils-v0_51.
This script installs it automatically. Add pfile_utils-v0_51/bin to the PATH environment variable if it is NOT
installed under the Kaldi tools folder
Download ----------------------------------------------------------------------------------------------
Kaldi+PDNNis publicly available from
GitHub. Go to your Kaldi setup(e.g., egs/wsj/s5) and check out the latest version.
svn co https://github.com/yajiemiao/kaldipdnn/trunk/run_wsj run_wsj
svn co https://github.com/yajiemiao/pdnn/trunk pdnn
svn co https://github.com/yajiemiao/kaldipdnn/trunk/steps_pdnn steps_pdnn
The scripts and RESULTS appearunder run_wsj. Kaldi+PDNN currently supoorts thefollowing datasets:
run_timit -- TIMIT
run_wsj -- Wall StreetJournal
run_swbd -- Switchboard (the complete300-hour setup)
run_swbd_110h -- Switchboard (the110-hour setup)
run_tedlium -- TED-LIUM (transcribing TED talks)
Benchmark Results ----------------------------------------------------------------------------------------------
Systems with *TBA* are being verified, and their numbers will be updated soon.
TIMIT PER(%) dev [test]
run-dnn.sh 18.8 20.2 run-bnf-tandem.sh 16.3 17.8
run-dnn-fbank.sh 20.2 21.6
run-cnn.sh 19.0 19.7
run-dnn-maxout.sh 17.5 19.0
Wall Street Journal WER(%) dev93[eval92]
run-dnn.sh 7.18 4.08
run-bnf-tandem.sh 6.72 3.81
run-dnn-fbank.sh 7.38 4.27
run-cnn.sh 7.27 4.29
Switchboard(the 300-hour setup) WER% Hub'00-SWB [HUB'00]
run-dnn.sh 15.4 21.4 run-bnf-tandem.sh 15.0 21.7
run-dnn-fbank.sh TBA
Switchboard (the 110-hour setup) WER% Hub'00-SWB [HUB'00]
run-dnn.sh 19.2 25.6
run-bnf-tandem.sh 18.0 25.0
run-dnn-fbank.sh 21.7 28.2
run-cnn.sh 19.5 25.6
run-bnf-fbank-tandem.sh 19.6 27.7
TED-LIUM WER% dev [test]
run-dnn.sh 23.3 20.4 run-bnf-tandem.sh 22.0 19.3
run-dnn-fbank.sh 24.5 21.4
run-cnn.sh 22.7 19.7
run-dnn-maxout.sh 22.9 19.7
Systems ----------------------------------------------------------------------------------------------
Core
run-dnn.sh Hybrid model with DNN and fMLLR features
run-bnf-tandem.sh Tandem system with
deep bottleneck features trained over fMLLRs
run-dnn-fbank.sh Hybrid model with DNN and filterbanks
run-cnn.sh Hybrid model with CNN and filterbanks
Extentions
run-dnn-maxout.sh
Hybrid model with
deep maxout networks and fMLLRs
run-bnf-fbank-tandem.sh Tandem system with
deep bottleneck features trained over filterbanks
SAT for DNNs
Various Speaker Adaptive Training recipes for DNNs. Refer to
hereContacting us ----------------------------------------------------------------------------------------------
You can post your questions,suggestions, and discussions to
GitHub Issues.
You can also send emails to
Yajie Miao (yajiemiao AT gmail.com)
Reference
----------------------------------------------------------------------------------------------
Please cite the following manuscript if you use Kaldi+PDNN in your papers/publications:
YajieMiao, "
Kaldi+PDNN: BuildingDNN-based ASR Systems with Kaldi and PDNN," arXiv:1401.6984, 2014.