OTHER Better time resolutionthan Fourier Transform It replaces the

OTHER METHODS 5S. No    Method    Property    Advantage    Disadvantage1     Principal componentAnalysis (PCA)    Nonlinear featureextraction method,Linear map; rapid;Eigen vector-based,    Good result for Gaussian data.     The directions maximizingvariance do not always maximize information.2     LinearDiscriminateAnalysis(LDA)    Supervised linear map, Depend on Eigen vector,Nonlinear featureextraction method.    Better than PCA for classification, Handles the case where the within class frequencies are unequal and their performance has been examined on randomly generated test data.    If the distribution is significantlynon-Gaussian the LDA projectionwill not be able to preserve any complex structure of the data, which may be needed for classification.3     IndependentcomponentAnalysis(ICA)    Nonlinear featureextraction method, Linear map, iterative nonGaussian.    Blind than PCA for classification     Extracted components are notordered.4    Filter Bank analysis     Filter tuned requiredfrequencies    It provides a spectral analysis with any degree of frequency resolution(wide or narrow), even with nonlinear filter spacing andbandwidths.    always take more calculation and processing time than discrete Fourier analysis using the FFT5    Kernel based featureextraction method    Nonlinear transformations     Dimensionality reduction leads tobetter classification and it is used to remove noisy and redundant features and improvement inclassification error.    Slow similarity calculation speed6     Wavelet     Better time resolutionthan Fourier Transform    It replaces the fixed bandwidth ofFourier transform with one proportional to frequency whichallows better time resolution athigh frequencies than Fouriertransform    It requires longer compressiontime.7    RASTA Filtering     For Noisy speech     It finds out feature in noisy data     It increases the dependence of the data on its previous context.    DECODINGIt is performed to find the best match for the incoming feature vectors using the knowledge base, it actually recognizes the speech utterance by combining and optimizing the information conveyed by acoustic and language models.The standard approach to large vocabulary continuous speech recognition (LVCSR) is to assume a simple probabilistic model of speech production whereby a specified word sequence, W, produces an acoustic observation sequence Y, with probability P (W, Y). The goal is then to decode the word string, based on the acoustic observation sequence, so that the decoded string has the maximum a posteriori (MAP) probability. 245P(W ??A)=arg???max?_(w_W ) ? P(W?A)?eq 1Using Bayes’ rule, it can be written as;P(W?A)=  (P(A?W)P(W))/P(A) ?eq 2Since P(A) is independent of W, the MAP decoding rule can be;W ?= (arg ) ??max?_w  ?(P(A|W))?(likelihood )  ?(P(W) )?prior?eq 3    ACOUSTIC MODEL In the acoustic modeling or phone recognition stage, we compute the likelihoodof the observed spectral feature vectors given linguistic units (words, phones, subpartsof phones). 4 An acoustic model is implemented using different approaches such as HMM, ANNs, dynamic Bayesian networks (DBNs), Support Vector Machines (SVM). HMM is widely used among all, as it’s proved to be an efficient algorithm for training and recognition. The first term in equation (3) P(A/W), is generally called the acoustic model, as it estimates the probability of a sequence of acoustic observations, conditioned on the word string. Hence P(A/W) is computed. For LVCSR systems, it is necessary to build statistical models for sub word speech units, build up word models from these sub word speech unit models (using a lexicon to describe the composition of words), and then postulate word sequences and evaluate the acoustic model probabilities via standard concatenation methods. 2    LANGUAGE MODELThe second term in equation (3) P(W), is called the language model. It describes the probability associated with a postulated sequence of words. Such language models can incorporate both syntactic and semantic constraints of the language and the recognition task. 2Generally, Speech recognition systems uses bi-gram, trigram, n-gram language models for finding correct word sequence by predicting the likelihood of the nth word, using the n-1 earlier words. 5Language models can be classified into: 8Uniform model: each word has equal probability of occurrence.Stochastic model: probability of occurrence of a word depends on the word preceding it.Finite state languages: languages use a finite3 state network to define the allowed word sequences.Context free grammar: It can be used to encode which kind of sentences is allowed.Schematic Architecture for a (simplified) speech recognizer decoding a single sentence 4     Pattern ClassificationTwo steps of Pattern Classification are Pattern training and Pattern Comparison. It is the process of comparing the unknown test pattern with each sound class reference pattern and computing a measure of similarity between them. After complete training of the system, at the time of testing, patterns are classified to recognize the speech.There are various techniques which are opted for pattern classification.     APROACHES OF SPEECH RECOGNITION TECHNIQUESBasically, there are 3 approaches to pattern classification/Speech Recognition. The taxonomy of these approaches is;     Acoustic Phonetic RecognitionFirst introduced by Hemdal & Hughes in 1967, Acoustic Phonetic Recognition approach is based on finding speech sounds and providing appropriate labels to these sounds.    PERFORMANCE EVALUATION: WORD ERROR RATEAccuracy and speed are the parameters to define the performance of speech recognition. The standard evaluation metric for speech recognition systems is the word error rate. The word error rate is based on how much the word string returned by the recognizer (often called the hypothesized word string) differs from a correct or reference transcription. 4Given such a correct transcription, the first step in computing word error is to compute the minimum edit distance in words between the hypothesized and correct strings. The result of this computation will be the minimum number of word substitutions, word insertions, and word deletions necessary to map between the correct and hypothesized strings. The word error rate (WER) is then defined as follows (note that because the equation includes insertions, the error rate can be greater than 100%): 4WORD ERROR RATE(WER)=100 ×  (Insertions+Substitutions+Deletions)/(Total Words in Corecct Transcript)The SER (Sentence Error Rate), which tells how many sentences had at least one error i.e. the percentage of sentences with at least one-word error: 4SENTENCE ERROR RATE(SER)=100 ×  (No.  of Sentences with at least one word error)/(Total number of sentences)    TOOLS FOR ASR    APPLICATIONS