admin@publications.scrs.in   
Computational Intelligence and Machine Learning

NeuroVoice: Leveraging Neural Networks for Precise Gender Classification in Audio

Authors: Rekha Kaushik, Pritam Goyal and Atharv Pandey


Publishing Date: 23-04-2025

ISBN: 978-81-975670-5-6

DOI: https://doi.org/10.56155/978-81-975670-5-6-5

Abstract

In this paper, the model for gender recognition is developed with voice samples using various machine learning algorithms and acoustic parameters. It is divided at the beginning into the training and test data of the dataset. There then follows a number of key steps and techniques as part of the process that improves the performance of this model. The paper focuses on a holistic approach toward gender classification from audio data through various techniques of data preprocessing, augmentation, feature scaling, model development, and their performance evaluation. First, it encodes the class label (male/female) into a numerical format through label encoding. Then, it extracts critical features like MFCC, Chroma Features, Spectral Contrast, and Pitch to extract the most essential characteristics of the audio. Data augmentation with SMOTE avoids bias in the dataset by creating artificial samples. Features are scaled with Min-Max Scaling to enhance model convergence and performance. Several advanced neural network architectures like MLP with Batch Normalization and Dropout techniques have also been considered. Stratified K-Fold Cross-Validation ensures robustness and avoidance of bias in the evaluation. In this study, each model has an evaluation including performance metrics, such as accuracy, precision, recall, F1-score, confusion matrix, and classification report. Remarkably, this model has an accuracy of 99.7% against the test dataset, improving both in total accuracy and robustness. The findings could be of importance in telecommunication, human-computer interaction, and security systems where accurate gender recognition from voice is required.

Keywords

MFCC, SMOTE, Neural Network, Batch Normalization, Dropout, Stratified Kfold Cross Validation.

Cite as

Rekha Kaushik, Pritam Goyal and Atharv Pandey, "NeuroVoice: Leveraging Neural Networks for Precise Gender Classification in Audio", In: Sandeep Kumar and Kavita Sharma (eds), Computational Intelligence and Machine Learning, SCRS, India, 2025, pp. 45-55. https://doi.org/10.56155/978-81-975670-5-6-5

Recent