Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Sofiane Cherif; Abdelhafid Kaddour; Abdelmoudjib Benkada; Said Karoui

doi:10.2478/msr-2025-0030

Authors

Sofiane Cherif Signals, Systems, and Data Laboratory (LSSD), Electronics Department, Faculty of Electrical Engineering, University of Sciences and Technology of Oran Mohamed Boudiaf (USTO MB), Oran, Algeria https://orcid.org/0009-0008-5498-8698
Abdelhafid Kaddour Signals, Systems, and Data Laboratory (LSSD), Electronics Department, Faculty of Electrical Engineering, University of Sciences and Technology of Oran Mohamed Boudiaf (USTO MB), Oran, Algeria https://orcid.org/0009-0008-0630-0932
Abdelmoudjib Benkada Signals, Systems, and Data Laboratory (LSSD), Electronics Department, Faculty of Electrical Engineering, University of Sciences and Technology of Oran Mohamed Boudiaf (USTO MB), Oran, Algeria https://orcid.org/0000-0002-9948-3123
Said Karoui Signals, Systems, and Data Laboratory (LSSD), Electronics Department, Faculty of Electrical Engineering, University of Sciences and Technology of Oran Mohamed Boudiaf (USTO MB), Oran, Algeria https://orcid.org/0009-0004-1244-4132

DOI:

https://doi.org/10.2478/msr-2025-0030

Keywords:

laryngeal pathology detection, voice signal processing, empirical mode decomposition, Mel-spectrogram, scalogram

Abstract

In this paper, a novel method for detecting of laryngeal pathologies using deep neural networks and time–frequency signal processing techniques is presented. The proposed approach combines empirical mode decomposition (EMD) and wavelet analysis to extract discriminative features from healthy and pathological voice recordings obtained from the Saarbrücken Voice Database (SVD). Each voice signal is pre-processed and decomposed into intrinsic mode functions (IMFs), from which the most relevant IMF is selected based on a temporal energy criterion. Two sets of features are derived from the selected IMF: Mel-frequency cepstral coefficients (MFCCs) and continuous wavelet transform (CWT) coefficients. These features are converted into Mel-spectrogram and scalogram images, respectively, which serve as inputs to the AlexNet convolutional neural network (AlexNet-CNN) for automatic binary classification. To the best of our knowledge, this is the first study to incorporate scalogram representations with AlexNet-CNN in the context of pathological voice detection. The results show that the proposed method achieves a classification accuracy of 85.66 % when using Mel-spectrograms and 86.4 % when using scalograms, demonstrating its potential for effective and interpretable voice pathology screening.

Detection of Laryngeal Pathologies from Voice using EMD-based Mel-Spectrograms and Scalograms with AlexNet

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Categories

License

How to Cite

Similar Articles

Information

Make a Submission

Keywords

Browse