Energy-reduced bio-inspired 1d-cnn for audio emotion recognition

Jiby Mariya Jose; Jeeva Jose

Energy-reduced bio-inspired 1d-cnn for audio emotion recognition

Authors Details :
Jiby Mariya Jose,
Jeeva Jose

Journal title : International Journal of Scientific Research in Computer Science, Engineering and Information Technology

Publisher : Technoscience Academy

Online ISSN : 2456-3307

Page Number : 1034-1054

Journal volume : 11

Journal issue : 3

1.1K Views Original Article

This paper proposes EPyNet, a deep learning architecture designed for energy reduced audio emotion recognition.In the domain of audio based emotion recognition, where discerning emotional cues from audio input is crucial, the integration of artificial intelligence techniques has sparked a transformative shift in accuracy and performance.Deep learning , renowned for its ability to decipher intricate patterns, spearheads this evolution. However, the energy efficiency of deep learning models, particularly in resource-constrained environments, remains a pressing concern. Convolutional operations serve as the cornerstone of deep learning systems. However, their extensive computational demands leading to energy-inefficient computations render them as not ideal for deployment in scenarios with limited resources. Addressing these challenges, researchers came up with one-dimensional convolutional neural network (1D CNN) array convolutions, offering an alternative to traditional two-dimensional CNNs, with reduced resource requirements. However , this array-based operation reduced the resource requirement, but the energy-consumption impact was not studied. To bridge this gap, we introduce EPyNet, a deep learning architecture crafted for energy efficiency with a particular emphasis on neuron reduction. Focusing on the task of audio emotion recognition, We evaluate EPyNet on five public audio corpora-RAVDESS, TESS, EMO DB, CREMA D, and SAVEE.We propose three versions of EPyNet, a lightweight neural network designed for efficient emotion recognition, each optimized for different trade-offs between accuracy and energy efficiency. Experimental results demonstrated that the 0.06M EPyNet reduced energy consumed by 76.5% while improving accuracy by 5% on RAVDESS, 25% on TESS, and 9.75% on SAVEE. The 0.2M and 0.9M models reduced energy consumed by 64.9% and 70.3%, respectively. Additionally, we compared our Proposed 0.06M system with the MobileNet models on the CIFAR-10 dataset and achieved significant improvements. The 1035 proposed system reduces energy by 86.2% and memory by 95.7% compared to MobileNet, with a slightly lower accuracy of 0.8%. Compared to MobileNetV2, it improves accuracy by 99.2% and reduces memory by 93.8%. When compared to MobileNetV3, it achieves 57.2% energy reduction, 85.1% memory reduction, and a 24.9% accuracy improvement. We further test the scalability and robustness of the proposed solution on different data dimensions and frameworks.

Energy-reduced bio-inspired 1d-cnn for audio emotion recognition

Article DOI & Crossmark Data

Article Subject Details

Article Keywords Details

Article File

More Article by Jiby Mariya Jose

More Audiology and speech language pathology Articles

A comparative study of social and economic aspect of migration

A comparative study of social and economic aspect of migration

Intersection of caste and gender based subjugation

Metapuf: a challenge response pair generator

Intersection of caste and gender based subjugation

Study of temperature variation in human peripheral region during wound healing process due to plastic surgery

भक्तिकालीन बाज़ारवाद और भक्ति

Acoustic feedback cancellation in efficient hearing aids using genetic algorithm

Estimation of snr based adaptive-feedback equalizers for feedback control in hearing aids

A lightweight deep learning framework using resource-efficient batch normalization for sarcasm detection