Journal title : International Journal of Scientific Research in Computer Science, Engineering and Information Technology
Publisher : Technoscience Academy
Online ISSN : 2456-3307
Page Number : 1034-1054
Journal volume : 11
Journal issue : 3
1K Views Original Article
This paper proposes EPyNet, a deep learning architecture designed for energy reduced audio emotion recognition.In the domain of audio based emotion recognition, where discerning emotional cues from audio input is crucial, the integration of artificial intelligence techniques has sparked a transformative shift in accuracy and performance.Deep learning , renowned for its ability to decipher intricate patterns, spearheads this evolution. However, the energy efficiency of deep learning models, particularly in resource-constrained environments, remains a pressing concern. Convolutional operations serve as the cornerstone of deep learning systems. However, their extensive computational demands leading to energy-inefficient computations render them as not ideal for deployment in scenarios with limited resources. Addressing these challenges, researchers came up with one-dimensional convolutional neural network (1D CNN) array convolutions, offering an alternative to traditional two-dimensional CNNs, with reduced resource requirements. However , this array-based operation reduced the resource requirement, but the energy-consumption impact was not studied. To bridge this gap, we introduce EPyNet, a deep learning architecture crafted for energy efficiency with a particular emphasis on neuron reduction. Focusing on the task of audio emotion recognition, We evaluate EPyNet on five public audio corpora-RAVDESS, TESS, EMO DB, CREMA D, and SAVEE.We propose three versions of EPyNet, a lightweight neural network designed for efficient emotion recognition, each optimized for different trade-offs between accuracy and energy efficiency. Experimental results demonstrated that the 0.06M EPyNet reduced energy consumed by 76.5% while improving accuracy by 5% on RAVDESS, 25% on TESS, and 9.75% on SAVEE. The 0.2M and 0.9M models reduced energy consumed by 64.9% and 70.3%, respectively. Additionally, we compared our Proposed 0.06M system with the MobileNet models on the CIFAR-10 dataset and achieved significant improvements. The 1035 proposed system reduces energy by 86.2% and memory by 95.7% compared to MobileNet, with a slightly lower accuracy of 0.8%. Compared to MobileNetV2, it improves accuracy by 99.2% and reduces memory by 93.8%. When compared to MobileNetV3, it achieves 57.2% energy reduction, 85.1% memory reduction, and a 24.9% accuracy improvement. We further test the scalability and robustness of the proposed solution on different data dimensions and frameworks.
DOI : https://doi.org/10.32628/CSEIT25113386
In this study, we implement a lightweight cnn for sarcasm detection using audio input. to achieve this goal, we propose depthfire block. we propose a lightweight version of the tra...
Communication is not always direct; it often involves nuanced elements like humor, irony, and sarcasm. this study introduces a novel two-level approach for sarcasm detection, lever...
S deep learning models continue to grow in complexity, the computational and energy demands associated with their training and deployment are becomingincreasingly significant, part...
Views: 3.5K
Views: 1.7K
Views: 1.1K