Speech-To-Text

Created on: 08.05.2016 11:21 AM
Edited on: 08.15.2016 2:05 PM
[ Edit Topic ]   [ Delete Topic ]


To-do list and notes for moving off of AT&T API

Testing / Proof-of-concept


  • figure out cpu limiting (nice / cpulimit)

  • http://blog.scoutapp.com/articles/2014/11/04/restricting-process-cpu-usage-using-nice-cpulimit-and-cgroups

  • cpulimit works but severely slows down processing time


  • see if I can process 8000 rate wav files?


  • ways to speed up the transcription process? (-bestpath no -fwdflat no)

  • use a smaller dictionary? How would that impact transcription?


  • experiment with a model using 20+ different input files from the same IVR line. Does it help translation? Is there any improvement over a single instance?


  • continue adding more lines into the ivr model



  • Model How-To


  • /usr/local/bin/sphinx_fe -argfile en-us/feat.params -samprate 16000 -c t.f -di . -do . -ei wav -eo mfc -mswav yes

  • /usr/local/libexec/sphinxtrain/bw -hmmdir en-us -moddeffn en-us/mdef -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn t.f -lsnfn t.t -accumdir .

  • cp -a en-us en-us-ivr

  • /usr/local/libexec/sphinxtrain/map_adapt -moddeffn en-us/mdef -ts2cbfn .ptm. -meanfn en-us/means -varfn en-us/variances -mixwfn en-us/mixture_weights -tmatfn en-us/transition_matrices -accumdir . -mapmeanfn en-us-ivr/means -mapvarfn en-us-ivr/variances -mapmixwfn en-us-ivr/mixture_weights -maptmatfn en-us-ivr/transition_matrices

  • /usr/local/libexec/sphinxtrain/mk_s2sendump -pocketsphinx yes -moddeffn en-us-ivr/mdef -mixwfn en-us-ivr/mixture_weights -sendumpfn en-us-ivr/sendump



  • /usr/local/bin/pocketsphinx_continuous -hmm en-us-ivr -infile ./advia.wav




  • Implementation


    Based on the assumption that I'll be going with a local solution for now, there are some steps that need to happen to make it work when I go live.

  • create repo / figure out how to distribute/backup model files


  • create script to automatically build new models and copy the correct files. Would also need to include a provision for backing up old data, including copying the t.f and t.t files into the respective backup dirs


  • create new validate() function calls.

  • - partly done

  • create new ivr scripts (probably be a mixed environment for a while, so I can't change the originals?)

  • - partly done

  • add SILENCE detection to new val() calls

  • just check if the buffer is NULL?


  • check buffer for "ERROR" or critical pocketsphinx errors?


  • add check if local wav file is missing and throw exception


  • revamp exception email code

  • putting different error codes in emails

  • more verbose messaging, such as what node and more info about the error




  • Local Solution (Per Node)


    Look into whether this is ideal? Saves the hassle of having to manage a central service but creates some problems with CPU and distribution of model files once updated. (Not terrible, use noderepo?)

  • compile libs/programs/models on an ivr node for testing

  • document steps, so that they are available

  • test with tarring up the directories and moving to a diff node?


  • Web Service


    Not ruling it out, since it would keep the transcription process similar to what is in use today but creates a few problems, such as bandwidth and then being responsible for a service that affects all nodes

  • create small test, try to post to it and get basic response back


  • Paid Service


    I figure I still need to utilize a paid service for some translations, unless I can somehow magically tune my model to work great on new audio.

  • use on new lines / initial translations


  • use on third checks to validate text mismatch and send to client


  • is using a local copy of the dragon agent via sftp even a possibility?

  • * test this

     


    [ Edit Topic ]   [ Delete Topic ]