(+86) 17688974102

(+86) 0755 26525102

alex.song@ptksai.com

Overview of technical principle and workflow of speech recognition technology

by:PTKSAI     2019-12-21
Voice is the natural way of human interaction. After the invention of computers, it has become the goal of people to enable machines to understand human language, understand the inner meaning of language, and make correct answers. We all hope that like the intelligent and advanced robot assistants in science fiction movies, when communicating with people, let them understand what you are saying. Speech recognition technology has turned this former dream of mankind into reality. Speech recognition is like the auditory system of a machine. This technology enables the machine to convert speech signals into corresponding texts or commands through recognition and understanding. Speech RecogniTIon technology, also known as AutomaTIc Speech RecogniTIon ,(ASR) The goal is to convert the vocabulary content in human speech into computer readable input, such as keys, binary codes or character sequences. Speech recognition is like the auditory system of a machine. It allows the machine to convert speech signals into corresponding texts or commands through recognition and understanding. Overview of technical principles and workflow of speech recognition technology speech recognition is an interdisciplinary subject involving a wide range of areas, it is closely related to acoustics, phonetics, linguistics, information theory, Pattern Recognition Theory and neurobiology. Speech recognition technology is gradually becoming the key technology in computer information processing technology. Overview of technical principles and workflow of speech recognition technology speech recognition is an interdisciplinary subject involving a wide range of areas, it is closely related to acoustics, phonetics, linguistics, information theory, Pattern Recognition Theory and neurobiology. Speech recognition technology is gradually becoming the key technology in computer information processing technology. Overview of the technical principle and workflow of speech recognition technology the development of speech recognition technology the research of speech recognition technology started in 1950s and Bell Laboratory developed 10 isolated digital recognition systems in 1952. Since 1960s, Reddy and others of Carnegie Mellon University in the United States have carried out research on continuous speech recognition, but the development is very slow during this period. In 1969, Pierce J of Bell Laboratories even compared speech recognition to something impossible in recent years in an open letter. Starting in 1980s, the hidden Markov model ( Hidden Markov model, HMM) The statistical model-based method represented by the method gradually occupies a dominant position in speech recognition research. HMM model can well describe the short-term stationary characteristics of speech signals, and integrate acoustics, linguistics, syntax and other knowledge into a unified framework. Since then, the research and application of HMM has gradually become the mainstream. For example, the * non-specific continuous speech recognition system is the SPHINX system developed by Lee Kai-Fu, who was still studying at Carnegie Mellon University at that time. Its core framework is GMM-HMM framework, where GMM ( Gaussian mixture model, Gaussian mixture model) It is used to model the observation probability of speech, while HMM models the timing of speech. In the late 1980s S, deep neural networks ( Deep neural network, DNN)The predecessor of the artificial neural network ( Artificial neural network, ANN)It has also become a direction of speech recognition research. However, this shallow neural network has a general effect on speech recognition tasks, and its performance is not as good as GMM-HMM model. Since 1990s, speech recognition has set off a small climax of * research and industrial application, mainly due to GMM- The discrimination training criterion of HMM acoustic model and the proposal of model adaptive method. During this period, the HTK open source toolkit released by Cambridge greatly lowered the threshold for speech recognition research. In the following nearly 10 years, the research progress of speech recognition has been relatively limited. Based on GMM- The overall effect of the speech recognition system based on HMM framework is far from reaching the practical level, and the research and application of speech recognition have fallen into a bottleneck. Hinton, 2006]Proposed use of restricted Boltzmann machines ( Restricted Boltzmann machine, RBM) Initialize the nodes of the neural network, that is, the deep confidence Network ( Deep belief network, DBN). DBN solves the problem that it is easy to fall into local zui in the process of deep neural network training. Since then, the tide of deep learning has officially opened. In 2009, Hinton and his student Mohamed D applied DBN to acoustic modeling of speech recognition and succeeded in a small vocabulary continuous speech recognition database such as TIMIT.
Custom message
Chat Online 编辑模式下无法使用
Chat Online inputting...