Applications Of End To End Automatic Speech Recognition

by Shivam Kashyap

This project comprehensively investigates the applications of end-to-end ASR, including models like Transformers and the combination of RNNs with CNNs and CTC loss for the English language. The primary goal is to evaluate the performances of these architectures for sequence-to-sequence tasks that require accurate temporal alignment and robust handling of input sequences with varying lengths, specifically in the context of speech recognition. We tried to compare applications of E2E ASR by using RNN-CNN models and transformers models. We used the datasets from LJspeech for the English language. The RNN-CNN model combines the advantages of CNNs for extracting features and RNNs for processing sequential input to enable alignment-free training. The CNN component enhances the encoding of local features, while the RNN component captures temporal dependencies. The combined impact of both components results in a better recognition accuracy. Using Transformer architecture and self-attention to capture long-range dependency free from recurrent connections, the second model By addressing RNN limitations in managing long sequences and parallel processing, this architectural design generates the possibility for shorter training and inference times. Our experiments on a widely used English language dataset, LJspeech, show notable performance enhancements. When handling large datasets, the Transformer model also shows better scalability and efficiency. We evaluated the WER and computation times for both models and found, over the RNN-CNN model, transformer-based model performance ranging from 3% to 4%. Furthermore discovered to be five times more time efficient per epoch was the transformer based model, albeit it requires more epochs for training. The results show that whereas Transformers show clear advantages in terms of computational efficiency and handling long-range dependencies, RNN-CNN models are optimal for tasks with significant local dependencies. This makes Transformers a compelling option for large-scale English language processing applications.

Leave a Reply

[script_15]

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. OK Read More

Privacy & Cookies Policy
-
00:00
00:00
Update Required Flash plugin
-
00:00
00:00
✓ Customized M.Tech Projects | ✓ Thesis Writing | ✓ Research Paper Writing | ✓ Plagiarism Checking | ✓ Assignment Preparation | ✓ Electronics Projects | ✓ Computer Science | ✓ AI ML | ✓ NLP Projects | ✓ Arduino Projects | ✓ Matlab Projects | ✓ Python Projects | ✓ Software Projects | ✓ Readymade M.Tech Projects | ✓ Java Projects | ✓ Manufacturing Projects M.Tech | ✓ Aerospace Projects | ✓ AI Gaming Projects | ✓ Antenna Projects | ✓ Mechatronics Projects | ✓ Drone Projects | ✓ Mtech IoT Projects | ✓ MTech Project Source Codes | ✓ Deep Learning Projects | ✓ Structural Engineering Projects | ✓ Cloud Computing Mtech Projects | ✓ Cryptography Projects | ✓ Cyber Security | ✓ Data Engineering | ✓ Data Science | ✓ Embedded Projects | ✓ AWS Projects | ✓ Biomedical Engineering Projects | ✓ Robotics Projects | ✓ Capstone Projects | ✓ Image Processing Projects | ✓ Power System Projects | ✓ Electric Vehicle Projects | ✓ Energy Projects Mtech | ✓ Simulation Projects | ✓ Thermal Engineering Projects

© 2024 All Rights Reserved Engineer’s Planet

Digital Media Partner #magdigit