Using Back-Propagation Neural Networks For Automatic Speech Recognition

The back-propagation neural network is one of the more recent advances in machine learning technology. A simple but powerful tool, this algorithm can be used for almost any number of applications in speech recognition, including recognizing human speech, producing translations, and automatically generating advertisements.

In the field of speech recognition, the back-propagation neural network has been used to recognize speech and translate it to text. The two most popular implementations of this technology are from Carnegie Mellon University of Toronto.

The networks that perform this task are described as regular, non-supervised, or supervised neural networks. A regular neural network is composed of a large number of layers that are not connected by any weights, while the non-supervised networks simply include no connections between layers. On the other hand, supervised networks are composed of small numbers of hidden layers that are connected with each other and with a weight that takes into account the importance of each layer.

As discussed earlier, the first example of regular neural networks is the English language model created by Carnegie Mellon University. The task of this network is to recognize the grammatical structure of English sentences and to translate them into different languages, which means that they learn a linguistic function. This task was accomplished with recurrent, recurrently-connected layers of LSTM (long short-term memory) units that take the previous sentences as inputs and are fed with a constantly-changing input.

Another more recent addition to the language model is a recurrently-connected layer of LSTM units created by University of Toronto researchers. Their goal was to create a model that would be able to create a translation for a fully-formed sentence (that cannot be learned by any English model) by using only the vocabulary of the given language. The layer in this model uses a back-propagation neural network (BPN) with five layers. Its goal is to produce a sentence that is only partially constructed.

Both language models have several other advantages. Both machines are able to learn vocabulary and conjugate verbs correctly, and both are able to provide human translators with a wide range of translations.

By combining these two examples, we can demonstrate how these BPN networks were able to achieve their goals. Both linguistic models achieved perfect scores in the test set for translation, as expected, but also provided translations for English sentences whose construction could not be learned by the models.

The back-propagation neural network that the Carnegie Mellon team used had been specifically designed to only teach itself a linguistic function and then to learn how to translate English sentences. It seems likely that a machine could use an artificial neural network to create a human-like knowledge base of the English language.

The recurrently-connected layer in the University of Toronto model was not designed to teach itself a linguistic function. Instead, it was built to build a knowledge base of its own so that it could provide translations and correctly perform grammatical tasks.

While both of these neural networks are powerful, it seems that the language model from the University of Toronto is the more capable machine. It has a more complete knowledge base and the ability to translate English sentences successfully.

It will probably take years for anyone to develop a completely language model. However, by integrating some of the key concepts used in the three examples above, we can begin to understand what the future of artificial intelligence may be like.