XPNL KB

Speech-To-Text

Created on: 08.05.2016 11:21 AM
Edited on: 08.15.2016 2:05 PM
[ Edit Topic ] [ Delete Topic ]

To-do list and notes for moving off of AT&T API

Testing / Proof-of-concept

figure out cpu limiting (nice / cpulimit)

http://blog.scoutapp.com/articles/2014/11/04/restricting-process-cpu-usage-using-nice-cpulimit-and-cgroups

cpulimit works but severely slows down processing time

see if I can process 8000 rate wav files?

ways to speed up the transcription process? (-bestpath no -fwdflat no)

use a smaller dictionary? How would that impact transcription?

experiment with a model using 20+ different input files from the same IVR line. Does it help translation? Is there any improvement over a single instance?

continue adding more lines into the ivr model

Model How-To

/usr/local/bin/sphinx_fe -argfile en-us/feat.params -samprate 16000 -c t.f -di . -do . -ei wav -eo mfc -mswav yes

/usr/local/libexec/sphinxtrain/bw -hmmdir en-us -moddeffn en-us/mdef -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn t.f -lsnfn t.t -accumdir .

cp -a en-us en-us-ivr

/usr/local/libexec/sphinxtrain/map_adapt -moddeffn en-us/mdef -ts2cbfn .ptm. -meanfn en-us/means -varfn en-us/variances -mixwfn en-us/mixture_weights -tmatfn en-us/transition_matrices -accumdir . -mapmeanfn en-us-ivr/means -mapvarfn en-us-ivr/variances -mapmixwfn en-us-ivr/mixture_weights -maptmatfn en-us-ivr/transition_matrices

/usr/local/libexec/sphinxtrain/mk_s2sendump -pocketsphinx yes -moddeffn en-us-ivr/mdef -mixwfn en-us-ivr/mixture_weights -sendumpfn en-us-ivr/sendump

/usr/local/bin/pocketsphinx_continuous -hmm en-us-ivr -infile ./advia.wav

Implementation

Based on the assumption that I'll be going with a local solution for now, there are some steps that need to happen to make it work when I go live.

create repo / figure out how to distribute/backup model files

create script to automatically build new models and copy the correct files. Would also need to include a provision for backing up old data, including copying the t.f and t.t files into the respective backup dirs

create new validate() function calls.

- partly done

create new ivr scripts (probably be a mixed environment for a while, so I can't change the originals?)

- partly done

add SILENCE detection to new val() calls

just check if the buffer is NULL?

check buffer for "ERROR" or critical pocketsphinx errors?

add check if local wav file is missing and throw exception

revamp exception email code

putting different error codes in emails

more verbose messaging, such as what node and more info about the error

Local Solution (Per Node)

Look into whether this is ideal? Saves the hassle of having to manage a central service but creates some problems with CPU and distribution of model files once updated. (Not terrible, use noderepo?)

compile libs/programs/models on an ivr node for testing

document steps, so that they are available

test with tarring up the directories and moving to a diff node?

Web Service

Not ruling it out, since it would keep the transcription process similar to what is in use today but creates a few problems, such as bandwidth and then being responsible for a service that affects all nodes

create small test, try to post to it and get basic response back

Paid Service

I figure I still need to utilize a paid service for some translations, unless I can somehow magically tune my model to work great on new audio.

use on new lines / initial translations

use on third checks to validate text mismatch and send to client

is using a local copy of the dragon agent via sftp even a possibility?

* test this

[ Edit Topic ] [ Delete Topic ]