Data Augmentation Strategies on Spectrogram Features for Infant Cry Classification Using Convolutional Neural Networks

Alam Alam; Nuk Ghurroh Setyoningrum; Robby Maududy; Dea Dewi Damayanti; Hilmi Rahmawati

doi:10.37058/innovatics.v7i2.16823

Data Augmentation Strategies on Spectrogram Features for Infant Cry Classification Using Convolutional Neural Networks

Alam Alam, Nuk Ghurroh Setyoningrum, Robby Maududy, Dea Dewi Damayanti, Hilmi Rahmawati

Abstract

Infant cry classification is an important task to support parents and healthcare professionals in understanding infants’ needs, yet the challenge of limited and imbalanced datasets often reduces model accuracy and generalization. This study proposes the application of diverse audio data augmentation strategies including time stretching, time shifting, pitch scaling, and polarity inversion combined with spectrogram representation to enhance Convolutional Neural Network (CNN) performance in classifying infant cries. The dataset from the Donate-a-Cry Corpus was expanded from 457 to 6,855 samples through augmentation, improving class balance and variability. Experimental results show that CNN accuracy increased from 85% before augmentation to 99.85% after augmentation, with precision, recall, and F1-score reaching near-perfect values across all categories. The confusion matrix further confirms robust classification with minimal misclassifications. These findings demonstrate that data augmentation is crucial to overcoming dataset limitations, enriching acoustic feature diversity, and reducing model bias, while offering practical implications for the development of accurate, reliable, and real-world applicable infant cry detection systems.

Full Text:

PDF 118-126

References

A. Ekinci and E. Küçükkülahli, “Classification of Baby Cries Using Machine Learning Algorithms,” 2023.

V. A. Kherdekar, “Convolution Neural Network Model for Recognition of Speech for Words used in Mathematical Expression,” 2021.

N. Ghurroh Setyoningrum, E. Utami, Kusrini, and F. Wahyu Wibowo, “A Systematic Literature Review of Audio Signal Processing Methods for Infant Cry Recognition and Interpretation,” in Proceedings of the International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2024, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/CENIM64038.2024.10882830.

T. Nadia Maghfira, T. Basaruddin, and A. Krisnadhi, “Infant cry classification using CNN - RNN,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jun. 2020. doi: 10.1088/1742-6596/1528/1/012019.

A. R. Ambili and R. C. Roy, “The Effect of Synthetic Voice Data Augmentation on Spoken Language Identification on Indian Languages,” IEEE Access, vol. 11, pp. 102391–102407, 2023, doi: 10.1109/ACCESS.2023.3316142.

K. Shea, O. St-Cyr, and T. Chau, “Ecological Design of an Augmentative and Alternative Communication Device Interface,” 2021.

A. Kachhi, S. Chaturvedi, H. A. Patil, and D. K. Singh, “Data Augmentation for Infant Cry Classification,” in 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 433–437. doi: 10.1109/ISCSLP57327.2022.10037931.

G. Vankudre, V. Ghulaxe, A. Dhomane, S. Badlani, and T. Rane, “A Survey on Infant Emotion Recognition through Video Clips,” in Proceedings of 2nd IEEE International Conference on Computational Intelligence and Knowledge Economy, ICCIKE 2021, Institute of Electrical and Electronics Engineers Inc., Mar. 2021, pp. 296–300. doi: 10.1109/ICCIKE51210.2021.9410786.

N. G. Setyoningrum, E. Utami, Kusrini, and F. W. Wibowo, “A Comprehensive Survey of Infant Cry Classification Research Trends and Methods: A Systematic Review,” in 2024 6th International Conference on Cybernetics and Intelligent System (ICORIS), IEEE, Nov. 2024, pp. 1–6. doi: 10.1109/ICORIS63540.2024.10903693.

H. Choi, L. Zhang, and C. Watkins, “Dual representations: A novel variant of Self-Supervised Audio Spectrogram Transformer with multi-layer feature fusion and pooling combinations for sound classification,” Neurocomputing, vol. 623, Mar. 2025, doi: 10.1016/j.neucom.2025.129415.

Q. M. M. Zarandah, S. Mohd Daud, and S. S. Abu-Naser, “SPECTROGRAM FLIPPING: A NEW TECHNIQUE FOR AUDIO AUGMENTATION,” J Theor Appl Inf Technol, vol. 15, no. 11, 2023, [Online]. Available: www.jatit.org

A. Chaiwachiragompol and N. Suwannata, “The Study of Learning System for Infant Cry Classification Using Discrete Wavelet Transform and Extreme Machine Learning,” Ingenierie des Systemes d’Information, vol. 27, no. 3, pp. 433–440, Jun. 2022, doi:

18280/isi.270309.

L. Zhang, Q. Q. Li, and H. F. Zhang, “A wideband and high-gain circularly polarized reconfigurable antenna array based on the solid-state plasma,” Engineering Science and Technology, an International Journal, vol. 48, Dec. 2023, doi: 10.1016/j.jestch.2023.101584.

H. T. Xu, J. Zhang, and L. R. Dai, “Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2022, pp. 1963–1967. doi: 10.21437/Interspeech.2022-18.

H. Pan, Y. P. Li, and H. F. Zhang, “Design and optimization of circularly polarized dielectric resonator antenna array based on Al2O3 ceramic,” Alexandria Engineering Journal, vol. 82, pp. 154–166, Nov. 2023, doi: 10.1016/j.aej.2023.09.063.

H. A. Patil, A. Kachhi, and A. T. Patil, “CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry,” IEEE/ACM Trans Audio Speech Lang Process, 2023, doi: 10.1109/TASLP.2023.3325971.

M. Charola, A. Kachhi, and H. A. Patil, “Whisper Encoder features for Infant Cry Classification,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2023, pp. 1773–1777. doi: 10.21437/Interspeech.2023-1916.

V. R. Joshi, K. Srinivasan, P. M. D. R. Vincent, V. Rajinikanth, and C. Y. Chang, “A Multistage Heterogeneous Stacking Ensemble Model for Augmented Infant Cry Classification,” Front Public Health, vol. 10, Mar. 2022, doi: 10.3389/fpubh.2022.819865.

A. Alex, L. Wang, P. Gastaldo, and A. Cavallaro, “Data augmentation for speech separation,” Speech Commun, vol. 152, Jul. 2023, doi: 10.1016/j.specom.2023.05.009.

L. F. A. O. Pellicer, T. M. Ferreira, and A. H. R. Costa, “Data augmentation techniques in natural language processing,” Appl Soft Comput, vol. 132, Jan. 2023, doi: 10.1016/j.asoc.2022.109803.

Z. K. D. Alkayyali, S. Anuar Bin Idris, and S. S. Abu-Naser, “A NEW ALGORITHM FOR AUDIO FILES AUGMENTATION,” J Theor Appl Inf Technol, vol. 30, no. 12, 2023, [Online]. Available: www.jatit.org

A. Chatziagapi et al., “Data augmentation using GANs for speech emotion recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2019, pp. 171–175. doi: 10.21437/Interspeech.2019-2561.

A. F. R. Nogueira, H. S. Oliveira, J. J. M. Machado, and J. M. R. S. Tavares, “Sound Classification and Processing of Urban Environments: A Systematic Literature Review,” Sensors, vol. 22, no. 22, Nov. 2022, doi: 10.3390/s22228608.

D. Budaghyan, C. C. Onu, A. Gorin, C. Subakan, and D. Precup, “CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.00969

Y. Sun, T. Midori Maeda, C. Solís-Lemus, D. Pimentel-Alarcón, and Z. Buřivalová, “Classification of animal sounds in a hyperdiverse rainforest using convolutional neural networks with data augmentation,” Ecol Indic, vol. 145, Dec. 2022, doi: 10.1016/j.ecolind.2022.109621.

H. Kheddar, M. Hemis, and Y. Himeur, “Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.01255

B. Li, H. Fei, F. Li, T. Chua, and D. Ji, “Multimodal Emotion-Cause Pair Extraction with Holistic Interaction and Label Constraint,” ACM Transactions on Multimedia Computing, Communications, and Applications, Aug. 2024, doi: 10.1145/3689646.

R. Alharbi, “MF-Saudi: A multimodal framework for bridging the gap between audio and textual data for Saudi dialect detection,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 6, Jul. 2024, doi: 10.1016/j.jksuci.2024.102084.

G. Felipe1 et al., “Identification of Infants’ Cry Motivation Using Spectrograms.” [Online]. Available: https://sourceforge.net/projects/sox/

L. Le, A. N. M. H. Kabir, C. Ji, S. Basodi, and Y. Pan, “Using Transfer Learning, SVM, and Ensemble Classification to Classify Baby Cries Based on Their Spectrogram Images,” in Proceedings - 2019 IEEE 16th International Conference on Mobile Ad Hoc and Smart Systems Workshops, MASSW 2019, Institute of Electrical and Electronics Engineers Inc., Nov. 2019, pp. 106–110. doi: 10.1109/MASSW.2019.00028.

A. S. Podda, R. Balia, L. Pompianu, S. Carta, G. Fenu, and R. Saia, “CARgram: CNN-based accident recognition from road sounds through intensity-projected spectrogram analysis,” Digital Signal Processing: A Review Journal, vol. 147, Apr. 2024, doi: 10.1016/j.dsp.2024.104431.

R. Jahangir, “CNN-SCNet: A CNN net-based deep learning framework for infant cry detection in household setting,” Engineering Reports, 2023, doi: 10.1002/eng2.12786.

N. G. Setyoningrum, E. Utami, K. Kusrini, and F. W. Wibowo, “Improving Infant Cry Recognition Using MFCC And CNN-Based Audio Augmentation,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 2, pp. 995–1016, May 2025, doi: 10.52436/1.jutif.2025.6.2.4373.

A. Abbaskhah, H. Sedighi, and H. Marvi, “Infant cry classification by MFCC feature extraction with MLP and CNN structures,” Biomed Signal Process Control, vol. 86, Sep. 2023, doi: 10.1016/j.bspc.2023.105261.

C. Ji and Y. Pan, “Infant Vocal Tract Development Analysis and Diagnosis by Cry Signals with CNN Age Classification.”

T. Ozseven, “Infant cry classification by using different deep neural network models and hand-crafted features,” Biomed Signal Process Control, vol. 83, May 2023, doi: 10.1016/j.bspc.2023.104648.

T. Ozseven, “A Review of Infant Cry Recognition and Classification based on Computer-Aided Diagnoses,” in HORA 2022 - 4th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022. doi: 10.1109/HORA55278.2022.9800038.

G. Maguolo, M. Paci, L. Nanni, and L. Bonan, “Audiogmenter: a MATLAB toolbox for audio data augmentation,” Applied Computing and Informatics, 2021, doi: 10.1108/ACI-03-2021-0064.

Y. Ozer and M. Muller, “Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques,” IEEE/ACM Trans Audio Speech Lang Process, vol. 32, pp. 1214–1225, 2024, doi: 10.1109/TASLP.2024.3356980.

E. Todt and B. A. Krinski, “Introduction CNN Layers CNN Models Popular Frameworks Papers References Convolutional Neural Network-CNN,” 2019.

G. Coro, S. Bardelli, A. Cuttano, R. T. Scaramuzzo, and M. Ciantelli, “A self-training automatic infant-cry detector,” Neural Comput Appl, vol. 35, no. 11, pp. 8543–8559, Apr. 2023, doi: 10.1007/s00521-022-08129-w.

C. Ji, M. Chen, B. Li, and Y. Pan, “INFANT CRY CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS.”

C. Ji, T. B. Mudiyanselage, Y. Gao, and Y. Pan, “A review of infant cry analysis and classification,” Dec. 01, 2021, Springer Science and Business Media Deutschland GmbH. doi: 10.1186/s13636-021-00197-5.

F. Anders, M. Hlawitschka, and M. Fuchs, “Comparison of artificial neural network types for infant vocalization classification,” IEEE/ACM Trans Audio Speech Lang Process, vol. 29, pp. 54–67, 2021, doi: 10.1109/TASLP.2020.3037414.

DOI: https://doi.org/10.37058/innovatics.v7i2.16823

Refbacks

There are currently no refbacks.

Vol 7, No 2 (2025)

Data Augmentation Strategies on Spectrogram Features for Infant Cry Classification Using Convolutional Neural Networks

Abstract

Full Text:

References

Refbacks